I have a very large dataset (70k rows, 2600 columns, CSV format) that I have created by web scraping. Unfortunately, doing the pre-processing, processing etc. at some point
Not sure it will work for you but for the same symptoms i did convert the strings to ascii:
x <- iconv(x, "", "ASCII", "byte")
For non ascii chars, the indication is "<xx>"
with the hex code of the byte.
You can then gsub the hex codes to the values that suit you.
I've had a bit of a horrible time with this pernicious little problem, but I think/hope I've finally got somewhere.
After messing around with the read_csv
options locale=locale(encoding="xyz")
and trying various combinations of other solutions - the gsub
solution didn't work, I treid the stringi
solution...
It didn't work, either. But it has a function str_enc_detect
, which I ran on the problem values stri_enc_detect(x)
. It gave me a locale I hadn't tried - in this case windows-1252 - which I promptly set in read_csv options: locale=locale(encoding = "windows-1252")
Hey presto it's displaying correctly now.