I have a huge corpus, and I\'m interested in only appearance of a handful of terms that I know up front. Is there a way to create a term document matrix from the corpus using th
An another way of filtering a corpus; First assign your value to the meta part, say language; by looping elements of the corpus with the variable i, check whatever you want, then filter by using with these meta attribute.
corpusz[[i]]$meta["language"] <- 'tur'
idx <- meta(corpusz, "language") == 'tur'
filtered <- corpusz[idx]
Now filtered containes only the corpus elements we want.