Sum duplicate row values with awk

后端 未结 5 623
故里飘歌
故里飘歌 2020-12-11 05:23

I have a file with the following structure:

1486113768 3656
1486113768 6280
1486113769 530912
1486113769 5629824
1486113770 5122176
1486113772 3565920
148611         


        
相关标签:
5条回答
  • 2020-12-11 05:49
    $ awk '$1!=p{ if (NR>1) print p, s; p=$1; s=0} {s+=$2} END{print p, s}' file
    1486113768 9936
    1486113769 6160736
    1486113770 5122176
    1486113772 4096832
    1486113773 9229920
    1486113774 8568888
    

    The above uses almost no memory (just 1 string and 1 integer variables) and will print the output in the same order it appeared in your input.

    I highly recommend you read the book Effective Awk Programming, 4th Edition, by Arnold Robbins if you're going to be using awk both so you can learn how to write your own scripts and (while you're learning) so you can understand other peoples scripts well enough to separate the right from the wrong approaches given 2 scripts that produce the expected output given some specific sample input.

    0 讨论(0)
  • 2020-12-11 05:50

    Use an Awk as below,

    awk '{ seen[$1] += $2 } END { for (i in seen) print i, seen[i] }' file1
    1486113768 9936
    1486113769 6160736
    1486113770 5122176
    1486113772 4096832
    1486113773 9229920
    1486113774 8568888
    

    {seen[$1]+=$2} creates a hash-map with the $1 being treated as the index value and the sum is incremented only for those unique items from $1 in the file.

    0 讨论(0)
  • 2020-12-11 05:55

    Say you have a top ten lines from many log files output concatened in one file (and sorted with 'sort') with that kind of results :

       2142 /pathtofile1/00.jpg
       2173 /pathtofile1/00.jpg
       2100 /pathtofile1/00.jpg
       2127 /pathtofile1/00.jpg
    

    you can also change the order of sum:

    $ awk '{ seen[$2] += $1 } END { for (i in seen) print i, seen[i] }' top10s.txt | sort -k 2 -rn
    

    and you'll get that total:

    /pathtofile1/00.jpg 8542
    
    0 讨论(0)
  • 2020-12-11 05:56

    If datamash is okay

    $ datamash -t' ' -g 1 sum 2 < ip.txt 
    1486113768 9936
    1486113769 6160736
    1486113770 5122176
    1486113772 4096832
    1486113773 9229920
    1486113774 8568888
    
    • -t' ' set space as field delimiter
    • -g 1 group by 1st field
    • sum 2 sum 2nd field values
    • if the input file is not sorted, use datamash -st' ' -g 1 sum 2 where the -s option takes care of sorting
    0 讨论(0)
  • 2020-12-11 06:14

    Using Perl

    $ cat elmazzun.log
    1486113768 3656
    1486113768 6280
    1486113769 530912
    1486113769 5629824
    1486113770 5122176
    1486113772 3565920
    1486113772 530912
    1486113773 9229920
    1486113774 4020960
    1486113774 4547928
    $ perl -lane ' $kv{$F[0]}+=$F[1];END { print "$_ $kv{$_}" for (sort keys %kv)}' elmazzun.log
    1486113768 9936
    1486113769 6160736
    1486113770 5122176
    1486113772 4096832
    1486113773 9229920
    1486113774 8568888
    $
    
    0 讨论(0)
提交回复
热议问题