Concatenate gsub [duplicate]

问题

I'm currently running the following code to clean my data from accent characters:

df <- gsub('Á|Ã', 'A', df)
df <- gsub('É|Ê', 'E', df)
df <- gsub('Í',   'I', df)
df <- gsub('Ó|Õ', 'O', df)
df <- gsub('Ú',   'U', df)
df <- gsub('Ç',   'C', df)

However, I would like to do it in just one line (using another function for it would be ok). How can I do this?

回答1:

Try something like this

iconv(c('Á'), "utf8", "ASCII//TRANSLIT")

You can just add more elements to the c().

EDIT: it is machine dependent, check help(iconv)

Here is the R solution

mychar <- c('ÁÃÉÊÍÓÕÚÇ')
iconv(mychar, "latin1", "ASCII//TRANSLIT") # one line, as requested
[1] "AAEEIOOUC"

回答2:

It an encoding problem, Normally you resolve it by indicating the right encoding. If you still want to use regular expression to do it , you can use gsubfn to write one liner solution:

library(gsubfn)
ll <- list('Á'='A', 'Ã'='A', 'É'='E',
           'Ê'='E', 'Í'='I', 'Ó'='O',
           'Õ'='O', 'Ú'='U', 'Ç'='C')
gsubfn('Á|Ã|É|Ê|Í|Ó|Õ|Ú|Ç',ll,'ÁÃÉÊÍÓÕÚÇ')
[1] "AAEEIOOUC"
gsubfn('Á|Ã|É|Ê|Í|Ó|Õ|Ú|Ç',ll,c('ÁÃÉÊÍÓÕÚÇ','ÍÓÕÚÇ'))
[1] "AAEEIOOUC" "IOOUC"

回答3:

One option could be chartr

> toreplace <- LETTERS
> replacewith <- letters
> (somestring <- paste(sample(LETTERS,10),collapse=""))
[1] "MUXJVYNZQH"
> 
> chartr(
+   old=paste(toreplace,collapse=""),
+   new=paste(replacewith,collapse=""),
+   x=somestring
+   )
[1] "muxjvynzqh"

回答4:

df = as.data.frame(apply(df,2,function(x) gsub('Á|Ã', 'A', df)))

2 indicates columns and 1 indicates rows

来源：https://stackoverflow.com/questions/20384282/concatenate-gsub

标签

regex

optimization

gsub