bigrams instead of single words in termdocument matrix using R and Rweka

前端 未结 2 1025
南方客
南方客 2020-11-30 04:35

I\'ve found a way to use use bigrams instead of single tokens in a term-document matrix. The solution has been posed on stackoverflow here: findAssocs for multiple terms in

2条回答
  •  独厮守ぢ
    2020-11-30 04:42

    Inspired by Anthony's comment, I found out that you can specify the number of threads that the parallel library uses by default (specify it before you call the NgramTokenizer):

    # Sets the default number of threads to use
    options(mc.cores=1)
    

    Since the NGramTokenizer seems to hang on the parallel::mclapply call, changing the number of threads seems to work around it.

提交回复
热议问题