Error in nchar(Terms(x), type = “chars”) : invalid multibyte string, element 204, when inspecting document term matrix
问题 Here is the source code that I have used: MyData <- Corpus(DirSource("F:/Data/CSV/Data"),readerControl = list(reader=readPlain,language="cn")) SegmentedData <- lapply(MyData, function(x) unlist(segmentCN(x))) temp <- Corpus(DataframeSource(SegmentedData), readerControl = list(reader=readPlain, language="cn")) Preprocessing Data temp <- tm_map(temp, removePunctuation) temp <- tm_map(temp,removeNumbers) removeURL <- function(x)gsub("http[[:alnum:]]*"," ",x) temp <- tm_map(temp, removeURL) temp