How to select only a subset of corpus terms for TermDocumentMatrix creation in tm

前端 未结 2 605
无人及你
无人及你 2021-01-22 09:21

I have a huge corpus, and I\'m interested in only appearance of a handful of terms that I know up front. Is there a way to create a term document matrix from the corpus using th

2条回答
  •  甜味超标
    2021-01-22 09:58

    An another way of filtering a corpus; First assign your value to the meta part, say language; by looping elements of the corpus with the variable i, check whatever you want, then filter by using with these meta attribute.

    corpusz[[i]]$meta["language"] <- 'tur'
    
    idx <- meta(corpusz, "language") ==  'tur'
    filtered <- corpusz[idx]
    

    Now filtered containes only the corpus elements we want.

提交回复
热议问题