R sentiment analysis; 'lexicon' not found; 'sentiments' corrupted?

问题

I am trying to follow this on-line tutorial on sentiment analysis. The code:

new_sentiments <- sentiments %>% #From the tidytext package
  filter(lexicon != "loughran") %>% #Remove the finance lexicon
  mutate( sentiment = ifelse(lexicon == "AFINN" & score >= 0, "positive",
                         ifelse(lexicon == "AFINN" & score < 0,
                                "negative", sentiment))) %>%
  group_by(lexicon) %>%
  mutate(words_in_lexicon = n_distinct(word)) %>%
  ungroup()

Generates the error:

>Error in filter_impl(.data, quo) : 
>Evaluation error: object 'lexicon' not found.

Related, perhaps is that to me it appears the "sentiments" tables are acting strangely (corrupted?). Here is a head of 'sentiments':

> head(sentiments,3)
>  element_id sentence_id word_count sentiment                                  
> chapter
> 1          1           1          7         0 The First Book of Moses:  
> Called Genesis
> 2          2           1         NA         0 The First Book of Moses:  
> Called Genesis
> 3          3           1         NA         0 The First Book of Moses:  > 
> Called Genesis
>                                  category
> 1 The First Book of Moses:  Called Genesis
> 2 The First Book of Moses:  Called Genesis
> 3 The First Book of Moses:  Called Genesis

If I use Get_Sentiments for bing, AFINN or NRC, though, I get what looks like an appropriate reponse:

>  get_sentiments("bing")
> # A tibble: 6,788 x 2
>   word        sentiment
>   <chr>       <chr>    >   1 2-faced     negative 
> 2 2-faces     negative 
> 3 a+          positive 
> 4 abnormal    negative

I tried removing (remove.packages) and re-installing tidytext; no change in behavior. I am running R 3.5

Even if I am completely misunderstanding the problem, I would appreciate any insights anyone can give me.

回答1:

It appears tidytext had to be changed, which broke some of the code in the tutorial.

To make the code run, replace

new_sentiments <- sentiments %>% #From the tidytext package
  filter(lexicon != "loughran") %>% #Remove the finance lexicon
  mutate( sentiment = ifelse(lexicon == "AFINN" & score >= 0, "positive",
                              ifelse(lexicon == "AFINN" & score < 0,
                                     "negative", sentiment))) %>%
  group_by(lexicon) %>%
  mutate(words_in_lexicon = n_distinct(word)) %>%
  ungroup()

with

new_sentiments <- get_sentiments("afinn")
names(new_sentiments)[names(new_sentiments) == 'value'] <- 'score'
new_sentiments <- new_sentiments %>% mutate(lexicon = "afinn", sentiment = ifelse(score >= 0, "positive", "negative"),
                                                     words_in_lexicon = n_distinct((word)))

The next few graphs won't make as much sense (since we now only use one lexicon), but the rest of the tutorial will work

UPDATE here's an excellent explanation from the tidytext package author as to what happened.

回答2:

The following instructions will fix the new_sentiments dataset as shown in the Data Camp tutorial.

bing <- get_sentiments("bing") %>% 
     mutate(lexicon = "bing", 
            words_in_lexicon = n_distinct(word))    

nrc <- get_sentiments("nrc") %>% 
     mutate(lexicon = "nrc", 
            words_in_lexicon = n_distinct(word))

new_sentiments <- bind_rows(new_sentiments, bing, nrc)

The next instructions will display the "Words Counts by Lexicon" table as originally intended.

new_sentiments %>% 
     group_by(lexicon, sentiment, words_in_lexicon) %>% 
     summarise(distinct_words = n_distinct(word)) %>% 
     ungroup() %>% 
     spread(sentiment, distinct_words) %>% 
     mutate(lexicon = color_tile("lightblue", "lightblue")(lexicon), 
            words_in_lexicon = color_bar("lightpink")(words_in_lexicon)) %>% 
     my_kable_styling(caption = "Word Counts per Lexicon")

The subsequent graphs will work too!

来源：https://stackoverflow.com/questions/51127671/r-sentiment-analysis-lexicon-not-found-sentiments-corrupted

标签

sentiment-analysis