Can someone help me with how to find the most frequently used two and three words in a text using R?
My text is...
text <- c(\"Th
Here's a simple base R approach for the 5 most frequent words:
head(sort(table(strsplit(gsub("[[:punct:]]", "", text), " ")), decreasing = TRUE), 5)
# a the of in phrase
# 21 18 12 10 8
What it returns is an integer vector with the frequency count and the names of the vector correspond to the words that were counted.
gsub("[[:punct:]]", "", text) to remove punctuation since you don't want to count that, I guessstrsplit(gsub("[[:punct:]]", "", text), " ") to split the string on spacestable() to count unique elements' frequencysort(..., decreasing = TRUE) to sort them in decreasing orderhead(..., 5) to select only the top 5 most frequent words