Need to remove the count from the output when using “uniq -c” command

后端 未结 5 812
有刺的猬
有刺的猬 2020-12-06 13:16

I am trying to read a file and sort it by number of occurrences of a particular field. Suppose i want to find out the most repeated date from a log file then i use uniq -c o

5条回答
  •  孤城傲影
    2020-12-06 14:07

    The count from uniq is preceded by spaces unless there are more than 7 digits in the count, so you need to do something like:

    uniq -c | sort -nr | cut -c 9-
    

    to get columns (character positions) 9 upwards. Or you can use sed:

    uniq -c | sort -nr | sed 's/^.\{8\}//'
    

    or:

    uniq -c | sort -nr | sed 's/^ *[0-9]* //'
    

    This second option is robust in the face of a repeat count of 10,000,000 or more; if you think that might be a problem, it is probably better than the cut alternative. And there are undoubtedly other options available too.


    Caveat: the counts were determined by experimentation on Mac OS X 10.7.3 but using GNU uniq from coreutils 8.3. The BSD uniq -c produced 3 leading spaces before a single digit count. The POSIX spec says the output from uniq -c shall be formatted as if with:

    printf("%d %s", repeat_count, line);
    

    which would not have any leading blanks. Given this possible variance in output formats, the sed script with the [0-9] regex is the most reliable way of dealing with the variability in observed and theoretical output from uniq -c:

    uniq -c | sort -nr | sed 's/^ *[0-9]* //'
    

提交回复
热议问题