I am using rvest to parse a website. I\'m hitting a wall with these little non-breaking spaces. How does one remove the whitespace that is created by the
Posting this since I think it's the most robust approach.
I scraped a Wikipedia page and got this in my output (not sure if it'll copy-paste properly):
x <- " California"
And gsub("\\s", "", x) didn't change anything, which raised the flag that something fishy is going on.
To investigate, I did:
dput(charToRaw(strsplit(x, "")[[1]][1]))
# as.raw(c(0xc2, 0xa0))
To figure out how exactly that character is stored/recognized in memory.
With this in hand, we can use gsub a bit more robustly than in the other solutions:
gsub(rawToChar(as.raw(c(0xc2, 0xa0))), "", x)
# [1] "California"
(@MrFlick's suggestion to set the encoding didn't work for me, and it's not clear where @shabbychef got the input 160 for intToUtf8; this approach can be generalized to other similar situations)