tm: read in data frame, keep text id's, construct DTM and join to other dataset

前端 未结 5 1557
半阙折子戏
半阙折子戏 2020-12-29 11:48

I\'m using package tm.

Say I have a data frame of 2 columns, 500 rows. The first column is ID which is randomly generated and has both character and number in it: \"

5条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-12-29 12:30

    qdap 1.2.0 can do both tasks with little coding, though not a one liner ;-), and not necessarily faster than Ben's (as key_merge is a convenience wrapper for merge). Using all of Ben's data from above (which makes my answer look smaller when it's not that much smaller.

    ## The code
    library(qdap)
    mycorpus <- with(df, as.Corpus(txt, ID))
    
    mydtm <- as.dtm(Filter(as.wfm(mycorpus, 
         col1 = "docs", col2 = "text", 
         stopwords = tm::stopwords("english")), 3, 10))
    
    key_merge(matrix2df(mydtm, "ID"), df2, "ID")
    

提交回复
热议问题