Best way to simulate “group by” from bash?

前端 未结 14 1079
半阙折子戏
半阙折子戏 2020-11-29 15:03

Suppose you have a file that contains IP addresses, one address in each line:

10.0.10.1
10.0.10.1
10.0.10.3
10.0.10.2
10.0.10.1

You need a

14条回答
  •  悲&欢浪女
    2020-11-29 16:07

    Pure bash (no fork!)

    There is a way, using a bash function. This way is very quick as there is no fork!...

    ... While bunch of ip addresses stay small!

    countIp () { 
        local -a _ips=(); local _a
        while IFS=. read -a _a ;do
            ((_ips[_a<<24|${_a[1]}<<16|${_a[2]}<<8|${_a[3]}]++))
        done
        for _a in ${!_ips[@]} ;do
            printf "%.16s %4d\n" \
              $(($_a>>24)).$(($_a>>16&255)).$(($_a>>8&255)).$(($_a&255)) ${_ips[_a]}
        done
    }
    

    Note: IP addresses are converted to 32bits unsigned integer value, used as index for array. This use simple bash arrays, not associative array (wich is more expensive)!

    time countIp < ip_addresses 
    10.0.10.1    3
    10.0.10.2    1
    10.0.10.3    1
    real    0m0.001s
    user    0m0.004s
    sys     0m0.000s
    
    time sort ip_addresses | uniq -c
          3 10.0.10.1
          1 10.0.10.2
          1 10.0.10.3
    real    0m0.010s
    user    0m0.000s
    sys     0m0.000s
    

    On my host, doing so is a lot quicker than using forks, upto approx 1'000 addresses, but take approx 1 entire second when I'll try to sort'n count 10'000 addresses.

提交回复
热议问题