发表新帖

发表新帖

tm: read in data frame, keep text id's, construct DTM and join to other dataset

前端未结

关注

 5  1589

半阙折子戏 2020-12-29 11:48

I\'m using package tm.

Say I have a data frame of 2 columns, 500 rows. The first column is ID which is randomly generated and has both character and number in it: \"

5条回答

南笙 (楼主)

2020-12-29 12:23
There has been an update to the tm package in December 2017 and readTabular is gone
```
"Changes in tm version 0.7-2
SIGNIFICANT USER-VISIBLE CHANGES
DataframeSource now only processes data frames with the two mandatory columns "doc_id" and "text". Additional columns are used as document level metadata. This implements compatibility with Text Interchange Formats corpora (https://github.com/ropensci/tif)."
```
which makes it a bit easier to get your id (and whatever else metadata you need) for each document into corpus as described in https://cran.r-project.org/web/packages/tm/news.html
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

热议问题