Error faced while using TM package's VCorpus in R

最后都变了- 提交于 2019-12-01 16:59:40
Eva

I met the same problem when I updated the tm package to 0.7-2 version. I looked for details of DataframeSource(), it mentioned:

The first column must be named "doc_id" and contain a unique string identifier for each document. The second column must be named "text".

Details

A data frame source interprets each row of the data frame x as a document. The first column must be named "doc_id" and contain a unique string identifier for each document. The second column must be named "text" and contain a "UTF-8" encoded string representing the document's content. Optional additional columns are used as document level metadata.

I solved it with the following code:

df_cmp<- read.csv("test_file.csv",stringsAsFactors = F)

df_title <- data.frame(doc_id=row.names(df_cmp),
                       text=df_cmp$English.title)

You can try and change the column names to doc_id and text.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!