R text mining documents from CSV file (one row per doc)

给你一囗甜甜゛ 提交于 2019-12-29 03:33:14

问题


I am trying to work with the tm package in R, and have a CSV file of customer feedback with each line being a different instance of feedback. I want to import all the content of this feedback into a corpus but I want each line to be a different document within the corpus, so that I can compare the feedback in a DocTerms Matrix. There are over 10,000 rows in my data set.

Originally I did the following:

fdbk_corpus <-Corpus(VectorSource(fdbk), readerControl = list(language="eng"), sep="\t")

This creates a corpus with 1 document and >10,000 rows, and I want >10,000 docs with 1 row each.

I imagine I could just have 10,000+ separate CSV or TXT documents within a folder and create a corpus from that... but I'm thinking there is a much simpler answer than that, reading each line as a separate document.


回答1:


Here's a complete workflow to get what you want:

# change this file location to suit your machine
file_loc <- "C:\\Documents and Settings\\Administrator\\Desktop\\Book1.csv"
# change TRUE to FALSE if you have no column headings in the CSV
x <- read.csv(file_loc, header = TRUE)
require(tm)
corp <- Corpus(DataframeSource(x))
dtm <- DocumentTermMatrix(corp)

In the dtm object each row will be a doc, or a line of your original CSV file. Each column will be a word.




回答2:


You can use TermDocumentMatrix() on your fdbk object, and obtain a term document matrix where each row represent a customer feedback.



来源:https://stackoverflow.com/questions/17997364/r-text-mining-documents-from-csv-file-one-row-per-doc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!