tm: read in data frame, keep text id's, construct DTM and join to other dataset

前端 未结 5 1589
半阙折子戏
半阙折子戏 2020-12-29 11:48

I\'m using package tm.

Say I have a data frame of 2 columns, 500 rows. The first column is ID which is randomly generated and has both character and number in it: \"

5条回答
  •  南笙
    南笙 (楼主)
    2020-12-29 12:23

    There has been an update to the tm package in December 2017 and readTabular is gone

    "Changes in tm version 0.7-2
    SIGNIFICANT USER-VISIBLE CHANGES
    DataframeSource now only processes data frames with the two mandatory columns "doc_id" and "text". Additional columns are used as document level metadata. This implements compatibility with Text Interchange Formats corpora (https://github.com/ropensci/tif)."
    

    which makes it a bit easier to get your id (and whatever else metadata you need) for each document into corpus as described in https://cran.r-project.org/web/packages/tm/news.html

提交回复
热议问题