bigrams instead of single words in termdocument matrix using R and Rweka

前端 未结 2 1036
南方客
南方客 2020-11-30 04:35

I\'ve found a way to use use bigrams instead of single tokens in a term-document matrix. The solution has been posed on stackoverflow here: findAssocs for multiple terms in

2条回答
  •  星月不相逢
    2020-11-30 04:48

    Seems there are problems using RWeka with parallel package. I found workaround solution here.

    The most important point is not loading the RWeka package and use the namespace in a encapsulated function.

    So your tokenizer should look like

    BigramTokenizer <- function(x) {RWeka::NGramTokenizer(x, RWeka::Weka_control(min = 2, max = 2))}
    

提交回复
热议问题