How to create a frequency list of every word in a file?

前端未结

关注

 11  968

I have a file like this:

This is a file with many words.
Some of the words appear more than once.
Some of the words only appear one time.

相关标签:

11条回答

春和景丽

2020-12-04 10:58

This might work for you:

tr '[:upper:]' '[:lower:]' <file |
tr -d '[:punct:]' |
tr -s ' ' '\n' | 
sort |
uniq -c |
sed 's/ *\([0-9]*\) \(.*\)/\2@\1/'

0 讨论(0)

情书的邮戳

2020-12-04 11:01

You can use tr for this, just run

tr ' ' '\12' <NAME_OF_FILE| sort | uniq -c | sort -nr > result.txt

Sample Output for a text file of city names:

3026 Toronto
2006 Montréal
1117 Edmonton
1048 Calgary
905 Ottawa
724 Winnipeg
673 Vancouver
495 Brampton
489 Mississauga
482 London
467 Hamilton

0 讨论(0)

[愿得一人]

2020-12-04 11:01

Let's use AWK!

This function lists the frequency of each word occurring in the provided file in Descending order:

function wordfrequency() {
  awk '
     BEGIN { FS="[^a-zA-Z]+" } {
         for (i=1; i<=NF; i++) {
             word = tolower($i)
             words[word]++
         }
     }
     END {
         for (w in words)
              printf("%3d %s\n", words[w], w)
     } ' | sort -rn
}

You can call it on your file like this:

$ cat your_file.txt | wordfrequency

Source: AWK-ward Ruby

0 讨论(0)

花落未央

2020-12-04 11:05

Content of the input file

$ cat inputFile.txt
This is a file with many words.
Some of the words appear more than once.
Some of the words only appear one time.

Using sed | sort | uniq

$ sed 's/\.//g;s/\(.*\)/\L\1/;s/\ /\n/g' inputFile.txt | sort | uniq -c
      1 a
      2 appear
      1 file
      1 is
      1 many
      1 more
      2 of
      1 once
      1 one
      1 only
      2 some
      1 than
      2 the
      1 this
      1 time
      1 with
      3 words

uniq -ic will count and ignore case, but result list will have This instead of this.

0 讨论(0)

灰色年华

2020-12-04 11:05

The sort requires GNU AWK (gawk). If you have another AWK without asort(), this can be easily adjusted and then piped to sort.

awk '{gsub(/\./, ""); for (i = 1; i <= NF; i++) {w = tolower($i); count[w]++; words[w] = w}} END {qty = asort(words); for (w = 1; w <= qty; w++) print words[w] "@" count[words[w]]}' inputfile

Broken out onto multiple lines:

awk '{
    gsub(/\./, ""); 
    for (i = 1; i <= NF; i++) {
        w = tolower($i); 
        count[w]++; 
        words[w] = w
    }
} 
END {
    qty = asort(words); 
    for (w = 1; w <= qty; w++)
        print words[w] "@" count[words[w]]
}' inputfile

0 讨论(0)

上一页 1 2