I am using rvest to parse a website. I\'m hitting a wall with these little non-breaking spaces. How does one remove the whitespace that is created by the
The   stands for "non-breaking space" which, in the unicode space, has it's own distinct character from a "regular" space (ie " "). Compare
charToRaw(" foo")
# [1] 20 66 6f 6f
charToRaw(bodytext)
# [1] c2 a0 66 6f 6f
So you'd want to use one of the special character classes for white space. You can remove all white spaces with
gsub("\\s", "", bodytext)
On Windows, I needed to make sure the encoding of the string was set properly
Encoding(bodytext) <- "UTF-8"
gsub("\\s", "", bodytext)