removing data with tags from a vector

前端 未结 2 1497
遇见更好的自我
遇见更好的自我 2021-01-17 03:13

I have a string vector which contains html tags e.g

  abc<-\"\"welcome abc Ha         


        
2条回答
  •  抹茶落季
    2021-01-17 03:46

    You can convert your piece of HTML to an XML document with htmlParse or htmlTreeParse. You can then convert it to text, i.e., strip all the tags, with xmlValue.

    abc <- "welcome abc Have fun!"
    library(XML)
    #doc <- htmlParse(abc, asText=TRUE)
    doc <- htmlTreeParse(abc, asText=TRUE)
    xmlValue( xmlRoot(doc) )
    

    If you also want to remove the contents of the links, you can use xmlDOMApply to transform the XML tree.

    f <- function(x) if(xmlName(x) == "span") xmlTextNode(" ") else x
    d <- xmlDOMApply( xmlRoot(doc), f )
    xmlValue(d)
    

提交回复
热议问题