tm: read in data frame, keep text id's, construct DTM and join to other dataset

前端 未结 5 1570
半阙折子戏
半阙折子戏 2020-12-29 11:48

I\'m using package tm.

Say I have a data frame of 2 columns, 500 rows. The first column is ID which is randomly generated and has both character and number in it: \"

5条回答
  •  一向
    一向 (楼主)
    2020-12-29 12:24

    In the code below, "content" should be lower case, not upper case as in the example below. This change will correctly populate the content field of the corpus.

    require(tm)
    m <- list(ID = "ID", content = "txt")
    myReader <- readTabular(mapping = m)
    mycorpus <- Corpus(DataframeSource(df), readerControl = list(reader = myReader))
    
    # Manually keep ID information from http://stackoverflow.com/a/14852502/1036500
    for (i in 1:length(mycorpus)) {
      attr(mycorpus[[i]], "ID") <- df$ID[i]
    }
    

    Now Try

    mycorpus[[3]]$content
    

提交回复
热议问题