问题
is there a difference in the order of uniq and sort when calling them in a shell script? i’m talking here about time- and space-wise.
grep 'somePattern' | uniq | sort
vs.
grep 'somePattern' | sort | uniq
a quick test on a 140 k lines textfile showed a slight speed improvement (5.5 s vs 5.0 s) for the first method (get uniq values and then sort)
i don’t know how to measure memory usage though
the question now is: does the order make a difference? or is it dependent on the returned greplines (many/few duplicates)
i’m looking forward to your answers
回答1:
The only correct order is to call uniq
after sort
, since the man page for uniq
says:
Discard all but one of successive identical lines from INPUT (or standard input), writing to OUTPUT (or standard output).
Therefore it should be
grep 'somePattern' | sort | uniq
回答2:
I believe that sort -u
is suited to this exact scenario, and will both sort and uniquify things. Obviously, that'll be more efficient than calling sort
and uniq
individually in either order.
回答3:
uniq depends on the items being sorted to remove duplicates(since it compares the previous and current item), hence why sort is always run before uniq. Try it and see.
来源:https://stackoverflow.com/questions/1402223/calling-uniq-and-sort-in-different-orders-in-shell