How to search for specific n-grams in a corpus using R
问题 I'm looking for specific n-grams in a corpus. Let's say I want to find 'asset management' and 'historical yield' in a collection of documents. This is how I loaded the corpus my_corpus <- VCorpus(DirSource(directory, pattern = ".pdf"), readerControl = list(reader = readPDF) I cleaned the corpus and did some basic calculations using document term matrices. Now I want to look for particular expressions and put them in a dataframe. This is what I use (thanks to phiver): ngrams <- c('asset