Trouble with strings with Unicode characters

前端 未结 2 887
日久生厌
日久生厌 2020-12-20 12:35

I have a very large dataset (70k rows, 2600 columns, CSV format) that I have created by web scraping. Unfortunately, doing the pre-processing, processing etc. at some point

2条回答
  •  抹茶落季
    2020-12-20 13:23

    I've had a bit of a horrible time with this pernicious little problem, but I think/hope I've finally got somewhere.

    After messing around with the read_csv options locale=locale(encoding="xyz") and trying various combinations of other solutions - the gsub solution didn't work, I treid the stringi solution...

    It didn't work, either. But it has a function str_enc_detect, which I ran on the problem values stri_enc_detect(x). It gave me a locale I hadn't tried - in this case windows-1252 - which I promptly set in read_csv options: locale=locale(encoding = "windows-1252")

    Hey presto it's displaying correctly now.

提交回复
热议问题