Converting a \u escaped Unicode string to ASCII

后端 未结 7 899
感动是毒
感动是毒 2020-11-30 09:05

After reading all about iconv and Encoding, I am still confused.

I am scraping the source of a web page I have a string that looks like thi

7条回答
  •  萌比男神i
    2020-11-30 09:48

    Although I have accepted Hong ooi's answer, I can't help thinking parse and eval is a heavyweight solution. Also, as pointed out, it is not secure, although for my application I can be confident that I will not get dangerous quotes.

    So, I have devised an alternative, somewhat brutal, approach:

    udecode <- function(string){
      uconv <- function(chars) intToUtf8(strtoi(chars, 16L))
      ufilter <- function(string) {
        if (substr(string, 1, 1)=="|") uconv(substr(string, 2, 5)) else string
      }
      string <- gsub("\\\\u([[:xdigit:]]{4})", ",|\\1,", string, perl=TRUE)
      strings <- unlist(strsplit(string, ","))
      string <- paste(sapply(strings, ufilter), collapse='')
      return(string)
    }
    

    Any simplifications welcomed!

提交回复
热议问题