I\'m trying to use the tm package in R to perform some text analysis. I tied the following:
require(tm)
dataSet <- Corpus(DirSource(\'tmp/\'))
dataSet <
I have often run into this issue and this Stack Overflow post is always what comes up first. I have used the top solution before, but it can strip out characters and replace them with garbage (like converting it’s
to it’s
).
I have found that there is actually a much better solution for this! If you install the stringi
package, you can replace tolower()
with stri_trans_tolower()
and then everything should work fine.