parsing html containing   (non-breaking space)

前端 未结 6 1676
轻奢々
轻奢々 2020-12-16 23:19

I am using rvest to parse a website. I\'m hitting a wall with these little non-breaking spaces. How does one remove the whitespace that is created by the

6条回答
  •  陌清茗
    陌清茗 (楼主)
    2020-12-16 23:56

    Using rex may make this type of task a little simpler. Also I am not able to reproduce your encoding problems, the following correctly substitutes the space regardless of encoding on my machine. (It is the same solution as [[:space:]] though, so likely has the same issue for you)

    re_substitutes(bodytext, rex(spaces), "", global = TRUE)
    
    #> [1] "foo"
    

提交回复
热议问题