R: Got problems in reading text file

无人久伴 提交于 2019-12-02 07:25:59

问题


I want to read text file in R. The code used to work. But when I want to retest it, it didn't.

#There are several text files in file'Obama' and file 'Romney'
candidates<-c("Obama","Romney")
pathname<-"C:/txt"
s.dir<-sprintf("%s/%s",pathname,candidates)
article<-Corpus(DirSource(directory=s.dir,encoding="ANSI"))

The error it displayed is

Error in iconv(readLines(x, warn = FALSE), encoding, "UTF-8", "byte") : 
unsupported conversion from 'ANSI' to 'UTF-8' in codepage 936

Also, when I use the code below to try to read a single text file:

m<-"C:/txt/Romney/1.txt"
cc<-Corpus(DirSource(directory=m,encoding="ANSI"))

It displayed:

Error in DirSource(directory = m, encoding = "ANSI") : empty directory

The file path exist, why I met this problem?


回答1:


Following is what you needed to do:

  1. Change the article<-Corpus(DirSource(directory=s.dir,encoding="ANSI")) to following:

article <- VCorpus(DirSource(directory = s.dir), readerControl = list(reader=readPlain))

  1. In cleanCorpus function, change the corpus.tmp <- tm_map(corpus.tmp, tolower) to following:

corpus.tmp <- tm_map(corpus.tmp, content_transformer(tolower))

Pay attention to usage of "content_transformer" function.

Once done with above, you should be able to fix the problem.




回答2:


Go to "cran.r-project.org/web/packages/tm/index.html"; and download and install the old version of tm, and wait until the bug is fixed.



来源:https://stackoverflow.com/questions/24643925/r-got-problems-in-reading-text-file

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!