How to find term frequency within a DTM in R?

拥有回忆 提交于 2019-12-06 02:55:18

Thanks to NicE for the advice - it works well. Adding in the weighting argument allows me to get out the term frequencies when I inspect the DTM. Simple matter then of summing up per column.

dtm <- DocumentTermMatrix(c, control = list(tokenize = BigramTokenizer, weighting=weightTf))
freqs <- as.data.frame(inspect(dtm))
colSums(freqs)

You can use Tyler Rinker's excellent qdap package. The freq_term function gives the terms and their frequencies. This example takes the 30 most frequent terms, if they have at least 4 letters, and uses one of qdap's stopword packages -- which is more extensive than the built in tm stopword in English (200 vs about 175).

qdap.freq <- freq_terms(dtm, top = 20, at.least = 4, stopwords = Top200Words) 
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!