calling uniq and sort in different orders in shell

荒凉一梦 提交于 2019-12-07 18:15:46

问题


is there a difference in the order of uniq and sort when calling them in a shell script? i’m talking here about time- and space-wise.

grep 'somePattern' | uniq | sort

vs.

grep 'somePattern' | sort | uniq

a quick test on a 140 k lines textfile showed a slight speed improvement (5.5 s vs 5.0 s) for the first method (get uniq values and then sort)

i don’t know how to measure memory usage though

the question now is: does the order make a difference? or is it dependent on the returned greplines (many/few duplicates)

i’m looking forward to your answers


回答1:


The only correct order is to call uniq after sort, since the man page for uniq says:

Discard all but one of successive identical lines from INPUT (or standard input), writing to OUTPUT (or standard output).

Therefore it should be

grep 'somePattern' | sort | uniq



回答2:


I believe that sort -u is suited to this exact scenario, and will both sort and uniquify things. Obviously, that'll be more efficient than calling sort and uniq individually in either order.




回答3:


uniq depends on the items being sorted to remove duplicates(since it compares the previous and current item), hence why sort is always run before uniq. Try it and see.



来源:https://stackoverflow.com/questions/1402223/calling-uniq-and-sort-in-different-orders-in-shell

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!