问题
I'm currently running the following code to clean my data from accent characters:
df <- gsub('Á|Ã', 'A', df)
df <- gsub('É|Ê', 'E', df)
df <- gsub('Í', 'I', df)
df <- gsub('Ó|Õ', 'O', df)
df <- gsub('Ú', 'U', df)
df <- gsub('Ç', 'C', df)
However, I would like to do it in just one line (using another function for it would be ok). How can I do this?
回答1:
Try something like this
iconv(c('Á'), "utf8", "ASCII//TRANSLIT")
You can just add more elements to the c()
.
EDIT: it is machine dependent, check help(iconv)
Here is the R
solution
mychar <- c('ÁÃÉÊÍÓÕÚÇ')
iconv(mychar, "latin1", "ASCII//TRANSLIT") # one line, as requested
[1] "AAEEIOOUC"
回答2:
It an encoding problem, Normally you resolve it by indicating the right encoding. If you still want to use regular expression to do it , you can use gsubfn
to write one liner solution:
library(gsubfn)
ll <- list('Á'='A', 'Ã'='A', 'É'='E',
'Ê'='E', 'Í'='I', 'Ó'='O',
'Õ'='O', 'Ú'='U', 'Ç'='C')
gsubfn('Á|Ã|É|Ê|Í|Ó|Õ|Ú|Ç',ll,'ÁÃÉÊÍÓÕÚÇ')
[1] "AAEEIOOUC"
gsubfn('Á|Ã|É|Ê|Í|Ó|Õ|Ú|Ç',ll,c('ÁÃÉÊÍÓÕÚÇ','ÍÓÕÚÇ'))
[1] "AAEEIOOUC" "IOOUC"
回答3:
One option could be chartr
> toreplace <- LETTERS
> replacewith <- letters
> (somestring <- paste(sample(LETTERS,10),collapse=""))
[1] "MUXJVYNZQH"
>
> chartr(
+ old=paste(toreplace,collapse=""),
+ new=paste(replacewith,collapse=""),
+ x=somestring
+ )
[1] "muxjvynzqh"
回答4:
df = as.data.frame(apply(df,2,function(x) gsub('Á|Ã', 'A', df)))
2 indicates columns and 1 indicates rows
来源:https://stackoverflow.com/questions/20384282/concatenate-gsub