Bash Script: count unique lines in file

后端 未结 3 1706
有刺的猬
有刺的猬 2020-12-12 12:30

Situation:

I have a large file (millions of lines) containing IP addresses and ports from a several hour network capture, one ip/port per line. Lines are of this

3条回答
  •  悲哀的现实
    2020-12-12 12:57

    To count the total number of unique lines (i.e. not considering duplicate lines) we can use uniq or Awk with wc:

    sort ips.txt | uniq | wc -l
    awk '!seen[$0]++' ips.txt | wc -l
    

    Awk's arrays are associative so it may run a little faster than sorting.

    Generating text file:

    $  for i in {1..100000}; do echo $RANDOM; done > random.txt
    $ time sort random.txt | uniq | wc -l
    31175
    
    real    0m1.193s
    user    0m0.701s
    sys     0m0.388s
    
    $ time awk '!seen[$0]++' random.txt | wc -l
    31175
    
    real    0m0.675s
    user    0m0.108s
    sys     0m0.171s
    

提交回复
热议问题