问题
I am trying to follow this on-line tutorial on sentiment analysis. The code:
new_sentiments <- sentiments %>% #From the tidytext package
filter(lexicon != "loughran") %>% #Remove the finance lexicon
mutate( sentiment = ifelse(lexicon == "AFINN" & score >= 0, "positive",
ifelse(lexicon == "AFINN" & score < 0,
"negative", sentiment))) %>%
group_by(lexicon) %>%
mutate(words_in_lexicon = n_distinct(word)) %>%
ungroup()
Generates the error:
>Error in filter_impl(.data, quo) :
>Evaluation error: object 'lexicon' not found.
Related, perhaps is that to me it appears the "sentiments" tables are acting strangely (corrupted?). Here is a head of 'sentiments':
> head(sentiments,3)
> element_id sentence_id word_count sentiment
> chapter
> 1 1 1 7 0 The First Book of Moses:
> Called Genesis
> 2 2 1 NA 0 The First Book of Moses:
> Called Genesis
> 3 3 1 NA 0 The First Book of Moses: >
> Called Genesis
> category
> 1 The First Book of Moses: Called Genesis
> 2 The First Book of Moses: Called Genesis
> 3 The First Book of Moses: Called Genesis
If I use Get_Sentiments for bing, AFINN or NRC, though, I get what looks like an appropriate reponse:
> get_sentiments("bing")
> # A tibble: 6,788 x 2
> word sentiment
> <chr> <chr> > 1 2-faced negative
> 2 2-faces negative
> 3 a+ positive
> 4 abnormal negative
I tried removing (remove.packages) and re-installing tidytext; no change in behavior. I am running R 3.5
Even if I am completely misunderstanding the problem, I would appreciate any insights anyone can give me.
回答1:
It appears tidytext
had to be changed, which broke some of the code in the tutorial.
To make the code run, replace
new_sentiments <- sentiments %>% #From the tidytext package
filter(lexicon != "loughran") %>% #Remove the finance lexicon
mutate( sentiment = ifelse(lexicon == "AFINN" & score >= 0, "positive",
ifelse(lexicon == "AFINN" & score < 0,
"negative", sentiment))) %>%
group_by(lexicon) %>%
mutate(words_in_lexicon = n_distinct(word)) %>%
ungroup()
with
new_sentiments <- get_sentiments("afinn")
names(new_sentiments)[names(new_sentiments) == 'value'] <- 'score'
new_sentiments <- new_sentiments %>% mutate(lexicon = "afinn", sentiment = ifelse(score >= 0, "positive", "negative"),
words_in_lexicon = n_distinct((word)))
The next few graphs won't make as much sense (since we now only use one lexicon), but the rest of the tutorial will work
UPDATE here's an excellent explanation from the tidytext
package author as to what happened.
回答2:
The following instructions will fix the new_sentiments
dataset as shown in the Data Camp tutorial.
bing <- get_sentiments("bing") %>%
mutate(lexicon = "bing",
words_in_lexicon = n_distinct(word))
nrc <- get_sentiments("nrc") %>%
mutate(lexicon = "nrc",
words_in_lexicon = n_distinct(word))
new_sentiments <- bind_rows(new_sentiments, bing, nrc)
The next instructions will display the "Words Counts by Lexicon" table as originally intended.
new_sentiments %>%
group_by(lexicon, sentiment, words_in_lexicon) %>%
summarise(distinct_words = n_distinct(word)) %>%
ungroup() %>%
spread(sentiment, distinct_words) %>%
mutate(lexicon = color_tile("lightblue", "lightblue")(lexicon),
words_in_lexicon = color_bar("lightpink")(words_in_lexicon)) %>%
my_kable_styling(caption = "Word Counts per Lexicon")
The subsequent graphs will work too!
来源:https://stackoverflow.com/questions/51127671/r-sentiment-analysis-lexicon-not-found-sentiments-corrupted