After reading all about iconv
and Encoding
, I am still confused.
I am scraping the source of a web page I have a string that looks like thi
I sympathise; I have struggled with R and unicode text in the past and not always successfully. If your data is in x
then first try a global replace, something like this:
x <- gsub("\u003D", "=>", x)
I sometimes use a construction like
lapply(x, utf8ToInt)
to see where the high code points are e.g. anything over 150. This helps me locate problems caused by non-breaking spaces, for example, which seem to pop up every now and again.