tm: read in data frame, keep text id's, construct DTM and join to other dataset

前端未结

关注

 5  1586

半阙折子戏 2020-12-29 11:48

I\'m using package tm.

Say I have a data frame of 2 columns, 500 rows. The first column is ID which is randomly generated and has both character and number in it: \"

5条回答

一向 (楼主)

2020-12-29 12:24

In the code below, "content" should be lower case, not upper case as in the example below. This change will correctly populate the content field of the corpus.

require(tm)
m <- list(ID = "ID", content = "txt")
myReader <- readTabular(mapping = m)
mycorpus <- Corpus(DataframeSource(df), readerControl = list(reader = myReader))

# Manually keep ID information from http://stackoverflow.com/a/14852502/1036500
for (i in 1:length(mycorpus)) {
  attr(mycorpus[[i]], "ID") <- df$ID[i]
}

Now Try

mycorpus[[3]]$content

0 讨论(0)

查看其它5个回答