问题
I want to read text file in R. The code used to work. But when I want to retest it, it didn't.
#There are several text files in file'Obama' and file 'Romney'
candidates<-c("Obama","Romney")
pathname<-"C:/txt"
s.dir<-sprintf("%s/%s",pathname,candidates)
article<-Corpus(DirSource(directory=s.dir,encoding="ANSI"))
The error it displayed is
Error in iconv(readLines(x, warn = FALSE), encoding, "UTF-8", "byte") :
unsupported conversion from 'ANSI' to 'UTF-8' in codepage 936
Also, when I use the code below to try to read a single text file:
m<-"C:/txt/Romney/1.txt"
cc<-Corpus(DirSource(directory=m,encoding="ANSI"))
It displayed:
Error in DirSource(directory = m, encoding = "ANSI") : empty directory
The file path exist, why I met this problem?
回答1:
Following is what you needed to do:
- Change the article<-Corpus(DirSource(directory=s.dir,encoding="ANSI")) to following:
article <- VCorpus(DirSource(directory = s.dir), readerControl = list(reader=readPlain))
- In cleanCorpus function, change the corpus.tmp <- tm_map(corpus.tmp, tolower) to following:
corpus.tmp <- tm_map(corpus.tmp, content_transformer(tolower))
Pay attention to usage of "content_transformer" function.
Once done with above, you should be able to fix the problem.
回答2:
Go to "cran.r-project.org/web/packages/tm/index.html"; and download and install the old version of tm, and wait until the bug is fixed.
来源:https://stackoverflow.com/questions/24643925/r-got-problems-in-reading-text-file