Awk/Unix group by

前端未结

关注

 6  1445

have this text file:

name, age
joe,42
jim,20
bob,15
mike,24
mike,15
mike,54
bob,21

Trying to get this (count):

joe 1
jim 1


                      
              相关标签:


      
      
        
          6条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  萌比男神i        
                
              
                            
                2020-12-04 14:59
              
            
            
                                                                       
$ awk -F, 'NR>1{arr[$1]++}END{for (a in arr) print a, arr[a]}' file.txt
joe 1
jim 1
mike 3
bob 2


EXPLANATIONS


-F, splits on ,
NR>1 treat lines after line 1
arr[$1]++ increment array arr (split with ,) with first column as key
END{} block is executed at the end of processing the file
for (a in arr) iterating over arr with a key
print a print key , arr[a] array with a key

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一生所求        
                
              
                            
                2020-12-04 14:59
              
            
            
                                                                       
cat file.txt | cut -d',' -f 1 | sort | uniq -c
2 bob
1 jim
1 joe
3 mike

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  死守一世寂寞        
                
              
                            
                2020-12-04 15:01
              
            
            
                                                                       
It looks like you want sorted output. You could simply pipe or print into sort -nk 2:

awk -F, 'NR>1 { a[$1]++ } END { for (i in a) print i, a[i] | "sort -nk 2" }' file


Results:

jim 1
joe 1
bob 2
mike 3




However, if you have GNU awk installed, you can perform the sorting without coreutils. Here's the single process solution that will sort the array by it's values. The solution should still be quite quick. Run like:

awk -f script.awk file


Contents of script.awk:

BEGIN {
    FS=","
}

NR>1 {
    a[$1]++
}

END {
    for (i in a) {
        b[a[i],i] = i
    }

    n = asorti(b)

    for (i=1;i<=n;i++) {
        split (b[i], c, SUBSEP)
        d[++x] = c[2]
    }

    for (j=1;j<=n;j++) {
        print d[j], a[d[j]]
    }
}


Results:

jim 1
joe 1
bob 2
mike 3


Alternatively, here's the one-liner:

awk -F, 'NR>1 { a[$1]++ } END { for (i in a) b[a[i],i] = i; n = asorti(b); for (i=1;i<=n;i++) { split (b[i], c, SUBSEP); d[++x] = c[2] } for (j=1;j<=n;j++) print d[j], a[d[j]] }' file

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  陌清茗        
                
              
                            
                2020-12-04 15:09
              
            
            
                                                                       
A strictly awk solution...

BEGIN { FS = "," }
{ ++x[$1] }
END { for(i in x) print i, x[i] }


If name, age is really in the file, you could adjust the awk program to ignore it...

BEGIN   { FS = "," }
/[0-9]/ { ++x[$1] }
END     { for(i in x) print i, x[i] }

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  盖世英雄少女心        
                
              
                            
                2020-12-04 15:15
              
            
            
                                                                       
I come up with two functions based on the answers here:
topcpu() {
    top -b -n1                                                                                  \
        | tail -n +8                                                                            \
        | awk '{ print $12, $9, $10 }'                                                          \
        | awk '{ CPU[$1] += $2; MEM[$1] += $3 } END { for (k in CPU) print k, CPU[k], MEM[k] }' \
        | sort -k3 -n                                                                           \
        | tail -n 10                                                                            \
        | column -t                                                                             \
        | tac
}

topmem() {
    top -b -n1                                                                                  \
        | tail -n +8                                                                            \
        | awk '{ print $12, $9, $10 }'                                                          \
        | awk '{ CPU[$1] += $2; MEM[$1] += $3 } END { for (k in CPU) print k, CPU[k], MEM[k] }' \
        | sort -k2 -n                                                                           \
        | tail -n 10                                                                            \
        | column -t                                                                             \
        | tac
}

$ topcpu
chrome           0    75.6
gnome-shell      6.2  7
mysqld           0    4.2
zsh              0    2.2
deluge-gtk       0    2.1
Xorg             0    1.6
scrcpy           0    1.6
gnome-session-b  0    0.8
systemd-journal  0    0.7
ibus-x11         6.2  0.7

$ topmem
top              12.5  0
Xorg             6.2   1.6
ibus-x11         6.2   0.7
gnome-shell      6.2   7
chrome           6.2   74.6
adb              6.2   0.1
zsh              0     2.2
xdg-permission-  0     0.2
xdg-document-po  0     0.1
xdg-desktop-por  0     0.4

enjoy!
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  孤城傲影        
                
              
                            
                2020-12-04 15:20
              
            
            
                                                                       
Strip the header row, drop the age field, group the same names together (sort), count identical runs, output in desired format.

tail -n +2 txt.txt | cut -d',' -f 1 | sort | uniq -c | awk '{ print $2, $1 }'


output

bob 2
jim 1
joe 1
mike 3

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复