Finding 2 & 3 word Phrases Using R TM Package

后端未结

关注

 7  2005

死守一世寂寞 2020-11-28 04:26

I am trying to find a code that actually works to find the most frequently used two and three word phrases in R text mining package (maybe there is another package for it th

7条回答

陌清茗 (楼主)

2020-11-28 04:57
This is part 5 of the FAQ of the tm package:

5. Can I use bigrams instead of single tokens in a term-document matrix?

Yes. RWeka provides a tokenizer for arbitrary n-grams which can be directly passed on to the term-document matrix constructor. E.g.:
```
  library("RWeka")
  library("tm")

  data("crude")

  BigramTokenizer <- function(x) NGramTokenizer(x, Weka_control(min = 2, max = 2))
  tdm <- TermDocumentMatrix(crude, control = list(tokenize = BigramTokenizer))

  inspect(tdm[340:345,1:10])
```
0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...