问题
I saw the following interesting usage of tar in a co-worker's Bash scripts:
`tar cf - * | (cd <dest> ; tar xf - )`
Apparently it works much like rsync -av does, but faster. The question arises, how?
-m
EDIT: Can anyone explain why should this solution be preferable over the following?
cp -rfp * dest
Is the former faster?
回答1:
On the difference between cp and tar to copy the directory hierarchies, a simple experiment can be conducted to show the difference:
alastair box:~/hack/cptest [1134]% mkdir src
alastair box:~/hack/cptest [1135]% cd src
alastair box:~/hack/cptest/src [1136]% touch foo
alastair box:~/hack/cptest/src [1137]% ln -s foo foo-s
alastair box:~/hack/cptest/src [1138]% ln foo foo-h
alastair box:~/hack/cptest/src [1139]% ls -a
total 0
-rw-r--r-- 2 alastair alastair 0 Nov 25 14:59 foo
-rw-r--r-- 2 alastair alastair 0 Nov 25 14:59 foo-h
lrwxrwxrwx 1 alastair alastair 3 Nov 25 14:59 foo-s -> foo
alastair box:~/hack/cptest/src [1142]% mkdir ../cpdest
alastair box:~/hack/cptest/src [1143]% cp -rfp * ../cpdest
alastair box:~/hack/cptest/src [1144]% mkdir ../tardest
alastair box:~/hack/cptest/src [1145]% tar cf - * | (cd ../tardest ; tar xf - )
alastair box:~/hack/cptest/src [1146]% cd ..
alastair box:~/hack/cptest [1147]% ls -l cpdest
total 0
-rw-r--r-- 1 alastair alastair 0 Nov 25 14:59 foo
-rw-r--r-- 1 alastair alastair 0 Nov 25 14:59 foo-h
lrwxrwxrwx 1 alastair alastair 3 Nov 25 15:00 foo-s -> foo
alastair box:~/hack/cptest [1148]% ls -l tardest
total 0
-rw-r--r-- 2 alastair alastair 0 Nov 25 14:59 foo
-rw-r--r-- 2 alastair alastair 0 Nov 25 14:59 foo-h
lrwxrwxrwx 1 alastair alastair 3 Nov 25 15:00 foo-s -> foo
The difference is in the hard-linked files. Notice how the hard-linked files are copied individually with cp
and together with tar
. To make the difference more obvious, have a look at the inodes for each:
alastair box:~/hack/cptest [1149]% ls -i cpdest
24690722 foo 24690723 foo-h 24690724 foo-s
alastair box:~/hack/cptest [1150]% ls -i tardest
24690801 foo 24690801 foo-h 24690802 foo-s
There are probably other reasons to prefer tar, but this is one big one, at least if you have extensively hard-linked files.
回答2:
It writes the archive to standard output, then pipes it to a subprocess -- wrapped by the parentheses -- that changes to a different directory and reads/extracts from standard input. That's what the dash character after the f
argument means. It's basically copying all the visible files and subdirectories of the current directory to another directory.
回答3:
For a directory with 25,000 empty files:
$ time { tar -cf - * | (cd ../bar; tar -xf - ); } real 0m4.209s user 0m0.724s sys 0m3.380s $ time { cp * ../baz/; } real 0m18.727s user 0m0.644s sys 0m7.127s
For a directory with 4 files of 1073741824 bytes (1GB) each
$ time { tar -cf - * | (cd ../bar; tar -xf - ); } real 3m44.007s user 0m3.390s sys 0m25.644s $ time { cp * ../baz/; } real 3m11.197s user 0m0.023s sys 0m9.576s
My guess is this phenomenon is highly filesystem-dependent. If I'm right you will see a drastic difference between a filesystem that specializes in numerous small files, such as reiserfs 3.6, and a filesystem that is better at handling large files.
(I ran the above tests on HFS+.)
回答4:
This is a unique usage of pipes. Basically, the first tar typically writes directly to a file, but instead it's going to write to stdout (the -), which is then redirected to the other tar which takes stdin rather than a file. Basically this is the same thing as tarring to a file and untarring later, except without the file in between.
回答5:
The PowerTools book has the copy as:
tar cf - * | (cd <dest> && tar xvBf - )
The '&&' is a conditional that checks the return code of the preceding command. Ihat is, if the "cd " failed, the "tar xf -" would not be executed. I always throw in a -v (verbose) and a -B (reblock input).
I use tar all the time. It is especially useful for copying to a remote system, such as:
tar cvf - . | ssh someone@somemachine '(cd somewhere && tar xBf -)'
回答6:
tar cf - * | (cd <dest> ; tar xf - )
is going to tar all not hidden files/directories of the current directory to stdout, then piping that into a new subshells' stdin. That shell first changes the current working directory to <dest>
, and then untars it to that directory.
回答7:
Some old versions of cp didn't have -f / -p (and similar) options for preserving permissions, so this tar trick did the job.
回答8:
I believe the tar will do a Windows style 'merge' operation with deeply nested directories, whereas the cp will overwrite sub-directories.
For example if you have the layout:
dir/subdir/file1
and you copy it to a destination that contains:
dir/subdir/file2
Then with copy you will be left with:
dir/subdir/file1
But with the tar command, your destination will contain:
dir/subdir/file1
dir/subdir/file2
回答9:
tar cf - *
This uses tar to send * to stdout
|
This does the obvious redirect of stdout to...
(cd <dest> ; tar xf - )
This, which changes PWD to the appropriate location and then extracts from stdin
I do not know why this would be faster than rsync, as there is no compression involved.
回答10:
The tar solution will preserve symbolic links, whereas cp will just make copies and destroy the links.
tar has been a standard Unix utility a lot longer than rsync. You're more likely to find it in a situation when a directory hierarchy needs to be copied to another location (even another computer). rsync is probably easier to use these days, but is slower because it compares both the source and destinations and sync's them. tar just copies in one direction.
回答11:
If you have GNU cp
(which all Linux-based systems will), the cp --archive
will work, even on hard-linked files, and tar is not needed.
回答12:
As it happens, a co-worker wrote a nearly identical command into one of our scripts. After I spent some time puzzling over it, I asked why he had used that rather than cp
. His answer, as I recall it, was that cp
is slow when making a copy from one file system to another.
Whether or not this is true would require more testing than I care to spend on the question, but it makes a certain amount of sense. The first tar
process reads from the source device as quickly as possible only waiting for that device to read. Meanwhile, the second tar
process reads from its input pipe and writes as quickly as possible. It might have to wait for input, but if writes on the destination device are slower than reads on the source device it will only wait on the destination device. A single cp
command will have to wait on both the source and the destination devices.
On the other hand, modern operating systems do a pretty good job of pre-caching IO operations. It's entirely possible cp
will spend most of its time waiting on writes and getting reads from memory rather than the device itself. It seems like one would need really solid data to chose using two tar
commands rather than the more straightforward cp
command.
来源:https://stackoverflow.com/questions/316078/interesting-usage-of-tar-but-what-is-happening