Linux command or script counting duplicated lines in a text file?

自作多情 提交于 2019-11-28 15:32:47

问题


If I have a text file with the following conent

red apple
green apple
green apple
orange
orange
orange

Is there a Linux command or script that I can use to get the following result?

1 red apple
2 green apple
3 orange

回答1:


Send it through sort (to put adjacent items together) then uniq -c to give counts, i.e.:

sort filename | uniq -c

and to get that list in sorted order (by frequency) you can

sort filename | uniq -c | sort -nr



回答2:


Almost the same as borribles' but if you add the d param to uniq it only shows duplicates.

sort filename | uniq -cd | sort -nr



回答3:


uniq -c file

and in case the file is not sorted already:

sort file | uniq -c




回答4:


Try this

cat myfile.txt| sort| uniq



回答5:


cat <filename> | sort | uniq -c



回答6:


Can you live with an alphabetical, ordered list:

echo "red apple
> green apple
> green apple
> orange
> orange
> orange
> " | sort -u 

?

green apple
orange
red apple

or

sort -u FILE

-u stands for unique, and uniqueness is only reached via sorting.

A solution which preserves the order:

echo "red apple
green apple
green apple
orange
orange
orange
" | { old=""; while read line ; do   if [[ $line != $old ]]; then  echo $line;   old=$line; fi ; done }
red apple
green apple
orange

and, with a file

cat file | { 
old=""
while read line
do
  if [[ $line != $old ]]
  then
    echo $line
    old=$line
  fi
done }

The last two only remove duplicates, which follow immediately - which fits to your example.

echo "red apple
green apple
lila banana
green apple
" ...

Will print two apples, split by a banana.




回答7:


To just get a count:

$> egrep -o '\w+' fruits.txt | sort | uniq -c

      3 apple
      2 green
      1 oragen
      2 orange
      1 red

To get a sorted count:

$> egrep -o '\w+' fruits.txt | sort | uniq -c | sort -nk1
      1 oragen
      1 red
      2 green
      2 orange
      3 apple

EDIT

Aha, this was NOT along word boundaries, my bad. Here's the command to use for full lines:

$> cat fruits.txt | sort | uniq -c | sort -nk1
      1 oragen
      1 red apple
      2 green apple
      2 orange


来源:https://stackoverflow.com/questions/6447473/linux-command-or-script-counting-duplicated-lines-in-a-text-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!