Plot highly correlated words against a specific word of interest [closed]

房东的猫 提交于 2019-12-01 13:37:37

问题


I am trying to plot the highest correlation of a word. For example, I want to graph the highest ten correlations of the word "whale." Can someone help me with the command for something like that? I have RGraphViz installed if that helps.

s.dir1<-"/PATHTOTEXT/MobyDickTxt"

s.cor1<-Corpus(DirSource(s.dir1), readerControl=list(reader=readPlain))
s.cor1<-tm_map(s.cor1, removePunctuation)
s.cor1<-tm_map(s.cor1, stripWhitespace)
s.cor1<-tm_map(s.cor1, tolower)
s.cor1<-tm_map(s.cor1, removeNumbers)
s.cor1<-tm_map(s.cor1, removeWords, stopwords("english"))
tdm1 <- TermDocumentMatrix(s.cor1)

m1 <- as.matrix(tdm)
v1 <- sort(rowSums(m), decreasing=TRUE)
d1 <- data.frame(word = names(v),freq=v)

回答1:


Here's a method to compute the top words correlating with a given word in a corpus, and plot those words and correlations.

Get example data...

require(tm)
data("crude")
tdm <- TermDocumentMatrix(crude)

Compute correlations and store in data frame...

toi <- "oil" # term of interest
corlimit <- 0.7 #  lower correlation bound limit.
oil_0.7 <- data.frame(corr = findAssocs(tdm, toi, corlimit)[[1]],
                  terms = names(findAssocs(tdm, toi, corlimit)[[1]]))

Create a factor to allow ggplot to sort the dataframe...

oil_0.7$terms <- factor(oil_0.7$terms ,levels = oil_0.7$terms)

Draw the plot...

require(ggplot2)
ggplot(oil_0.7, aes( y = terms  ) ) +
  geom_point(aes(x = corr), data = oil_0.7) +
  xlab(paste0("Correlation with the term ", "\"", toi, "\""))



来源:https://stackoverflow.com/questions/19549280/plot-highly-correlated-words-against-a-specific-word-of-interest

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!