Awk/Unix group by

前端 未结 6 1445
鱼传尺愫
鱼传尺愫 2020-12-04 14:17

have this text file:

name, age
joe,42
jim,20
bob,15
mike,24
mike,15
mike,54
bob,21

Trying to get this (count):

joe 1
jim 1
         


        
相关标签:
6条回答
  • 2020-12-04 14:59
    $ awk -F, 'NR>1{arr[$1]++}END{for (a in arr) print a, arr[a]}' file.txt
    joe 1
    jim 1
    mike 3
    bob 2
    

    EXPLANATIONS

    • -F, splits on ,
    • NR>1 treat lines after line 1
    • arr[$1]++ increment array arr (split with ,) with first column as key
    • END{} block is executed at the end of processing the file
    • for (a in arr) iterating over arr with a key
    • print a print key , arr[a] array with a key
    0 讨论(0)
  • 2020-12-04 14:59

    cat file.txt | cut -d',' -f 1 | sort | uniq -c

    2 bob
    1 jim
    1 joe
    3 mike
    
    0 讨论(0)
  • 2020-12-04 15:01

    It looks like you want sorted output. You could simply pipe or print into sort -nk 2:

    awk -F, 'NR>1 { a[$1]++ } END { for (i in a) print i, a[i] | "sort -nk 2" }' file
    

    Results:

    jim 1
    joe 1
    bob 2
    mike 3
    

    However, if you have GNU awk installed, you can perform the sorting without coreutils. Here's the single process solution that will sort the array by it's values. The solution should still be quite quick. Run like:

    awk -f script.awk file
    

    Contents of script.awk:

    BEGIN {
        FS=","
    }
    
    NR>1 {
        a[$1]++
    }
    
    END {
        for (i in a) {
            b[a[i],i] = i
        }
    
        n = asorti(b)
    
        for (i=1;i<=n;i++) {
            split (b[i], c, SUBSEP)
            d[++x] = c[2]
        }
    
        for (j=1;j<=n;j++) {
            print d[j], a[d[j]]
        }
    }
    

    Results:

    jim 1
    joe 1
    bob 2
    mike 3
    

    Alternatively, here's the one-liner:

    awk -F, 'NR>1 { a[$1]++ } END { for (i in a) b[a[i],i] = i; n = asorti(b); for (i=1;i<=n;i++) { split (b[i], c, SUBSEP); d[++x] = c[2] } for (j=1;j<=n;j++) print d[j], a[d[j]] }' file
    
    0 讨论(0)
  • 2020-12-04 15:09

    A strictly awk solution...

    BEGIN { FS = "," }
    { ++x[$1] }
    END { for(i in x) print i, x[i] }
    

    If name, age is really in the file, you could adjust the awk program to ignore it...

    BEGIN   { FS = "," }
    /[0-9]/ { ++x[$1] }
    END     { for(i in x) print i, x[i] }
    
    0 讨论(0)
  • 2020-12-04 15:15

    I come up with two functions based on the answers here:

    topcpu() {
        top -b -n1                                                                                  \
            | tail -n +8                                                                            \
            | awk '{ print $12, $9, $10 }'                                                          \
            | awk '{ CPU[$1] += $2; MEM[$1] += $3 } END { for (k in CPU) print k, CPU[k], MEM[k] }' \
            | sort -k3 -n                                                                           \
            | tail -n 10                                                                            \
            | column -t                                                                             \
            | tac
    }
    
    topmem() {
        top -b -n1                                                                                  \
            | tail -n +8                                                                            \
            | awk '{ print $12, $9, $10 }'                                                          \
            | awk '{ CPU[$1] += $2; MEM[$1] += $3 } END { for (k in CPU) print k, CPU[k], MEM[k] }' \
            | sort -k2 -n                                                                           \
            | tail -n 10                                                                            \
            | column -t                                                                             \
            | tac
    }
    
    $ topcpu
    chrome           0    75.6
    gnome-shell      6.2  7
    mysqld           0    4.2
    zsh              0    2.2
    deluge-gtk       0    2.1
    Xorg             0    1.6
    scrcpy           0    1.6
    gnome-session-b  0    0.8
    systemd-journal  0    0.7
    ibus-x11         6.2  0.7
    
    $ topmem
    top              12.5  0
    Xorg             6.2   1.6
    ibus-x11         6.2   0.7
    gnome-shell      6.2   7
    chrome           6.2   74.6
    adb              6.2   0.1
    zsh              0     2.2
    xdg-permission-  0     0.2
    xdg-document-po  0     0.1
    xdg-desktop-por  0     0.4
    

    enjoy!

    0 讨论(0)
  • 2020-12-04 15:20

    Strip the header row, drop the age field, group the same names together (sort), count identical runs, output in desired format.

    tail -n +2 txt.txt | cut -d',' -f 1 | sort | uniq -c | awk '{ print $2, $1 }'
    

    output

    bob 2
    jim 1
    joe 1
    mike 3
    
    0 讨论(0)
提交回复
热议问题