I have seen this question answered in other languages but not in R.
[Specifically for R text mining] I have a set of frequent phrases that is obtained from a Corpus
This is how I'd approach the problem now:
library(tm)
library(qdap)
## Create a MWE like you should have done:
corpus1 <- 'I have seen this question answered in other languages but not in R.
[Specifically for R text mining] I have a set of frequent phrases that is obtained from a Corpus.
Now I would like to search for the number of times these phrases have appeared in another corpus.
Is there a way to do this in TM package? (Or another related package)
For example, say I have an array of phrases, "tags" obtained from CorpusA. And another Corpus, CorpusB, of
couple thousand sub texts. I want to find out how many times each phrase in tags have appeared in CorpusB.
As always, I appreciate all your help!'
corpus2 <- "What have you tried? If you have seen it answered in another language, why don't you try translating that
language into R? – Eric Strom 2 hours ago
I am not a coder, otherwise would do. I just do not know a way to do this. – appletree 1 hour ago
Could you provide some example? or show what you have in mind for input and output? or a pseudo code?
As it is I find the question a bit too general. As it sounds I think you could use regular expressions
with grep to find your 'tags'. – AndresT 15 mins ago"
## Now the code:
## create the corpus and extract frequent terms (top7)
corp1 <- Corpus(VectorSource(corpus1))
(terms <- apply_as_df(corp1, freq_terms, top=7, stopwords=tm::stopwords("en")))
## WORD FREQ
## 1 corpus 3
## 2 phrases 3
## 3 another 2
## 4 appeared 2
## 5 corpusb 2
## 6 obtained 2
## 7 tags 2
## 8 times 2
## Use termco to search for these top 7 terms in a new corpus
corp2 <- Corpus(VectorSource(corpus2))
apply_as_df(corp2, termco, match.list=terms[, 1])
## docs word.count corpus phrases another appeared corpusb obtained tags times
## 1 1 96 0 0 1(1.04%) 0 0 0 1(1.04%) 0