Find the most frequently occuring words in a text in R

后端 未结 5 1607
刺人心
刺人心 2020-12-14 13:33

Can someone help me with how to find the most frequently used two and three words in a text using R?

My text is...

text <- c(\"Th         


        
5条回答
  •  渐次进展
    2020-12-14 14:20

    Here's a simple base R approach for the 5 most frequent words:

    head(sort(table(strsplit(gsub("[[:punct:]]", "", text), " ")), decreasing = TRUE), 5)
    
    #     a    the     of     in phrase 
    #    21     18     12     10      8 
    

    What it returns is an integer vector with the frequency count and the names of the vector correspond to the words that were counted.

    • gsub("[[:punct:]]", "", text) to remove punctuation since you don't want to count that, I guess
    • strsplit(gsub("[[:punct:]]", "", text), " ") to split the string on spaces
    • table() to count unique elements' frequency
    • sort(..., decreasing = TRUE) to sort them in decreasing order
    • head(..., 5) to select only the top 5 most frequent words

提交回复
热议问题