问题
The page 38 of the book Linux 101 Hacks suggests:
cat url-list.txt | xargs wget –c
I usually do:
for i in `cat url-list.txt`
do
wget -c $i
done
Is there some thing, other than length, where the xargs-technique is superior to the old good for-loop-technique in bash?
Added
The C source code seems to have only one fork. In contrast, how many forks have the bash-combo? Please, elaborate on the issue.
回答1:
From the Rationale section of a UNIX manpage for xargs. (Interestingly this section doesn't appear in the OS X BSD version of xargs
, nor in the GNU version.)
The classic application of the xargs utility is in conjunction with the find utility to reduce the number of processes launched by a simplistic use of the find -exec combination. The xargs utility is also used to enforce an upper limit on memory required to launch a process. With this basis in mind, this volume of POSIX.1-2008 selected only the minimal features required.
In your follow-up, you ask how many forks the other version will have. Jim already answered this: one per iteration. How many iterations are there? It's impossible to give an exact number, but easy to answer the general question. How many lines are there in your url-list.txt file?
There are other some other considerations. xargs
requires extra care for filenames with spaces or other no-no characters, and -exec
has an option (+
), that groups processing into batches. So, not everyone prefers xargs
, and perhaps it's not best for all situations.
See these links:
- http://www.sunmanagers.org/pipermail/summaries/2005-March/006255.html
- http://fahdshariff.blogspot.com/2009/05/find-exec-vs-xargs.html
回答2:
Also consider:
xargs -I'{}' wget -c '{}' < url-list.txt
but wget provides an even better means for the same:
wget -c -i url-list.txt
With respect to the xargs versus loop consideration, i prefer xargs when the meaning and implementation are relatively "simple" and "clear", otherwise, i use loops.
回答3:
xargs will also allow you to have a huge list, which is not possible with the "for" version because the shell uses command lines limited in length.
回答4:
xargs
is designed to process multiple inputs for each process it forks. A shell script with a for
loop over its inputs must fork a new process for each input. Avoiding that per-process overhead can give an xargs
solution a significant performance enhancement.
回答5:
instead of GNU/Parallel i prefer using xargs' built in parallel processing. Add -P to indicate how many forks to perform in parallel. As in...
seq 1 10 | xargs -n 1 -P 3 echo
would use 3 forks on 3 different cores for computation. This is supported by modern GNU Xargs. You will have to verify for yourself if using BSD or Solaris.
回答6:
Depending on your internet connection you may want to use GNU Parallel http://www.gnu.org/software/parallel/ to run it in parallel.
cat url-list.txt | parallel wget -c
回答7:
One advantage I can think of is that, if you have lots of files, it could be slightly faster since you don't have as much overhead from starting new processes.
I'm not really a bash expert though, so there could be other reasons it's better (or worse).
来源:https://stackoverflow.com/questions/1282697/cat-xargs-command-vs-for-bash-command