Finding ngrams in R and comparing ngrams across corpora
问题 I'm getting started with the tm package in R, so please bear with me and apologies for the big ol' wall of text. I have created a fairly large corpus of Socialist/Communist propaganda and would like to extract newly coined political terms (multiple words, e.g. "struggle-criticism-transformation movement"). This is a two-step question, one regarding my code so far and one regarding how I should go on. Step 1: To do this, I wanted to identify some common ngrams first. But I get stuck very early