Bash Script: count unique lines in file

后端未结

关注

 3  1706

有刺的猬 2020-12-12 12:30

Situation:

I have a large file (millions of lines) containing IP addresses and ports from a several hour network capture, one ip/port per line. Lines are of this

3条回答

悲哀的现实 (楼主)

2020-12-12 12:57

To count the total number of unique lines (i.e. not considering duplicate lines) we can use uniq or Awk with wc:

sort ips.txt | uniq | wc -l
awk '!seen[$0]++' ips.txt | wc -l

Awk's arrays are associative so it may run a little faster than sorting.

Generating text file:

$  for i in {1..100000}; do echo $RANDOM; done > random.txt
$ time sort random.txt | uniq | wc -l
31175

real    0m1.193s
user    0m0.701s
sys     0m0.388s

$ time awk '!seen[$0]++' random.txt | wc -l
31175

real    0m0.675s
user    0m0.108s
sys     0m0.171s

0 讨论(0)

查看其它3个回答