R text file and text mining…how to load data

后端 未结 6 1083
星月不相逢
星月不相逢 2020-12-13 10:51

I am using the R package tm and I want to do some text mining. This is one document and is treated as a bag of words.

I don\'t understand the documenta

6条回答
  •  青春惊慌失措
    2020-12-13 11:41

    The following assumes you have a directory of text files from which you want to create a bag of words.

    The only change that needs to be made is replace path = "C:\\windows\\path\\to\\text\\files\\ with your directory path.

    library(tidyverse)
    library(tidytext)
    
    # create a data frame listing all files to be analyzed
    all_txts <- list.files(path = "C:\\windows\\path\\to\\text\\files\\",   # path can be relative or absolute
                           pattern = ".txt$",  # this pattern only selects files ending with .txt
                           full.names = TRUE)  # gives the file path as well as name
    
    # create a data frame with one word per line
    my_corpus <- map_dfr(all_txts, ~ tibble(txt = read_file(.x)) %>%   # read in each file in list
                          mutate(filename = basename(.x)) %>%   # add the file name as a new column
                          unnest_tokens(word, txt))   # split each word out as a separate row
    
    # count the total # of rows/words in your corpus
    my_corpus %>%
      summarize(number_rows = n())
    
    # group and count by "filename" field and sort descending
    my_corpus %>%
      group_by(filename) %>%
      summarize(number_rows = n()) %>%
      arrange(desc(number_rows))
    
    # remove stop words
    my_corpus2 <- my_corpus %>%
      anti_join(stop_words)
    
    # repeat the count after stop words are removed
    my_corpus2 %>%
      group_by(filename) %>%
      summarize(number_rows = n()) %>%
      arrange(desc(number_rows))
    

提交回复
热议问题