How can I tokenize a text column in R? unnest function not working

末鹿安然 提交于 2021-02-10 04:01:27


I am a new R user. Will really appreciate if you can help me with solving the tokenization problem:

My task in brief: I am trying to import a text file in into R. One of the text columns is Headline. The dataset is basically a collection of news articles related to a disease.

Issue: I have tried many times to tokenize it using the unnest_tokens function.

It is showing me the following error messages:

Error in UseMethod("unnest_tokens_") : no applicable method for 'unnest_tokens_' applied to an object of class "character"

Error in unnest_tokens(word, Headline) : object 'word' not found


DengueNews %>%
unnest_tokens(word, Headline)

Note: Link of the dataset: I am following the instructions from


It is not clear how the data was read. As mentioned in the comments, if the data column 'Headline' is character class, it should work. Here, we use read_excl from readxl package to read the dataset. By default, columns that are character will be returned with character class attribute.

DengueNews <- read_excel("DengueNews.xlsx")
#[1] "character"

DengueNews %>%
  unnest_tokens(word, Headline)
# A tibble: 217 x 4
   Serial Date  Newscontent                                                                                                                                             word      
    <dbl> <chr> <chr>                                                                                                                                                   <chr>     
 1    216 43727 "The unofficial death toll is reported to be over 157, so far\r\n\r\n\r\nA total of 491 dengue patients have been admitted to different hospitals acro… dghs      
 2    216 43727 "The unofficial death toll is reported to be over 157, so far\r\n\r\n\r\nA total of 491 dengue patients have been admitted to different hospitals acro… 491       
 3    216 43727 "The unofficial death toll is reported to be over 157, so far\r\n\r\n\r\nA total of 491 dengue patients have been admitted to different hospitals acro… more      
 4    216 43727 "The unofficial death toll is reported to be over 157, so far\r\n\r\n\r\nA total of 491 dengue patients have been admitted to different hospitals acro… hospitali…
 5    216 43727 "The unofficial death toll is reported to be over 157, so far\r\n\r\n\r\nA total of 491 dengue patients have been admitted to different hospitals acro… for       
 6    216 43727 "The unofficial death toll is reported to be over 157, so far\r\n\r\n\r\nA total of 491 dengue patients have been admitted to different hospitals acro… dengue    
 7    216 43727 "The unofficial death toll is reported to be over 157, so far\r\n\r\n\r\nA total of 491 dengue patients have been admitted to different hospitals acro… in        
 8    216 43727 "The unofficial death toll is reported to be over 157, so far\r\n\r\n\r\nA total of 491 dengue patients have been admitted to different hospitals acro… 24hrs     
 9    215 43725 "The unofficial death toll is reported to be over 157, so far\r\n\r\n\r\nA fifth-grader schoolgirl has died of dengue fever at Dhaka Medical College a… 1         
10    215 43725 "The unofficial death toll is reported to be over 157, so far\r\n\r\n\r\nA fifth-grader schoolgirl has died of dengue fever at Dhaka Medical College a… more      
# … with 207 more rows

If we change the column class to another class factor, it would fail

DengueNews %>%
   mutate(Headline = factor(Headline)) %>%
   unnest_tokens(word, Healine)

