parsing html containing   (non-breaking space)

前端 未结 6 1654
轻奢々
轻奢々 2020-12-16 23:19

I am using rvest to parse a website. I\'m hitting a wall with these little non-breaking spaces. How does one remove the whitespace that is created by the

6条回答
  •  攒了一身酷
    2020-12-17 00:11

    jdharrison answered:

    gsub("\\W", "", bodytext)
    

    and, that will work but you can use:

    gsub("[[:space:]]", "", bodytext)
    

    which will remove all Space characters: tab, newline, vertical tab, form feed, carriage return, space and possibly other locale-dependent characters. It's a very readable alternative to other, cryptic regex classes.

提交回复
热议问题