Remove accents from a dataframe column in R

前端 未结 4 1308
感情败类
感情败类 2021-02-05 04:03

I got a data.table base. I got a term column in this data.table

class(base$term)
[1] character
length(base$term)
[1] 27486

I\'m able to remove

4条回答
  •  感动是毒
    2021-02-05 04:25

    You can apply this function

        rm_accent <- function(str,pattern="all") {
       if(!is.character(str))
        str <- as.character(str)
    
      pattern <- unique(pattern)
    
      if(any(pattern=="Ç"))
        pattern[pattern=="Ç"] <- "ç"
    
      symbols <- c(
        acute = "áéíóúÁÉÍÓÚýÝ",
        grave = "àèìòùÀÈÌÒÙ",
        circunflex = "âêîôûÂÊÎÔÛ",
        tilde = "ãõÃÕñÑ",
        umlaut = "äëïöüÄËÏÖÜÿ",
        cedil = "çÇ"
      )
    
      nudeSymbols <- c(
        acute = "aeiouAEIOUyY",
        grave = "aeiouAEIOU",
        circunflex = "aeiouAEIOU",
        tilde = "aoAOnN",
        umlaut = "aeiouAEIOUy",
        cedil = "cC"
      )
    
      accentTypes <- c("´","`","^","~","¨","ç")
    
      if(any(c("all","al","a","todos","t","to","tod","todo")%in%pattern)) # opcao retirar todos
        return(chartr(paste(symbols, collapse=""), paste(nudeSymbols, collapse=""), str))
    
      for(i in which(accentTypes%in%pattern))
        str <- chartr(symbols[i],nudeSymbols[i], str) 
    
      return(str)
    }
    

提交回复
热议问题