convert HTML Character Entity Encoding in R

后端 未结 3 1021
情话喂你
情话喂你 2020-12-05 03:07

Is there a way in R to convert HTML Character Entity Encodings?

I would like to convert HTML character entities like & to & o

3条回答
  •  感动是毒
    2020-12-05 04:01

    Update: this answer is outdated. Please check the answer below based on the new xml2 pkg.


    Try something along the lines of:

    # load XML package
    library(XML)
    
    # Convenience function to convert html codes
    html2txt <- function(str) {
          xpathApply(htmlParse(str, asText=TRUE),
                     "//body//text()", 
                     xmlValue)[[1]] 
    }
    
    # html encoded string
    ( x <- paste("i", "s", "n", "&", "a", "p", "o", "s", ";", "t", sep = "") )
    [1] "isn't"
    
    # converted string
    html2txt(x)
    [1] "isn't"
    

    UPDATE: Edited the html2txt() function so it applies to more situations

提交回复
热议问题