Perform multiple search-and-replaces on the colnames of a dataframe

天涯浪子 提交于 2021-01-27 13:20:46

问题


I have a dataframe with 95 cols and want to batch-rename a lot of them with simple regexes, like the snippet at bottom, there are ~30 such lines. Any other columns which don't match the search regex must be left untouched.

**** Example: names(tr) = c('foo', 'bar', 'xxx_14', 'xxx_2001', 'yyy_76', 'baz', 'zzz_22', ...) ****

I started out with a wall of 25 gsub()s - crude but effective:

names(tr) <- gsub('_1$',    '_R', names(tr))
names(tr) <- gsub('_14$',   '_I', names(tr))
names(tr) <- gsub('_22$',   '_P', names(tr))
names(tr) <- gsub('_50$',   '_O', names(tr))
... yada yada

@Joshua: mapply doesn't work, turns out it's more complicated and impossible to vectorize. names(tr) contains other columns, and when these patterns do occur, you cannot assume all of them occur, let alone in the exact order we defined them. Hence, try 2 is:

pattern <- paste('_', c('1','14','22','50','52','57','76','1018','2001','3301','6005'), '$', sep='')
replace <- paste('_', c('R','I', 'P', 'O', 'C', 'D', 'M', 'L',   'S',   'K',   'G'),         sep='')
do.call(gsub, list(pattern, replace, names(tr)))
Warning messages:
1: In function (pattern, replacement, x, ignore.case = FALSE, perl = FALSE,  :
  argument 'pattern' has length > 1 and only the first element will be used
2: In function (pattern, replacement, x, ignore.case = FALSE, perl = FALSE,  :
  argument 'replacement' has length > 1 and only the first element will be used

Can anyone fix this for me?


EDIT: I read all around SO and R doc on this subject for over a day and couldn't find anything... then when I post it I think of searching for '[r] translation table' and I find xlate. Which is not mentioned anywhere in the grep/sub/gsub documentation.

  1. Is there anything in base/gsubfn/data.table etc. to allow me to write one search-and-replacement instruction? (like a dictionary or translation table)

  2. Can you improve my clunky syntax to be call-by-reference to tr? (mustn't create temp copy of entire df)


EDIT2: my best effort after reading around was:

The dictionary approach (xlate) might be a partial answer to, but this is more than a simple translation table since the regex must be terminal (e.g. '_14$').

I could use gsub() or strsplit() to split on '_' then do my xlate translation on the last component, then paste() them back together. Looking for a cleaner 1/2-line idiom.

Or else I just use walls of gsub()s.


回答1:


Wall of gsub could be always replace by for-loop. And you can write it as a function:

renamer <- function(x, pattern, replace) {
    for (i in seq_along(pattern))
            x <- gsub(pattern[i], replace[i], x)
    x
}

names(tr) <- renamer(
     names(tr),
     sprintf('_%s$', c('1','14','22','50','52','57','76','1018','2001','3301','6005')),
     sprintf('_%s' , c('R','I', 'P', 'O', 'C', 'D', 'M', 'L',   'S',   'K',   'G'))
)

And I found sprintf more useful than paste for creation this kind of strings.




回答2:


The question predates the boom of the tidyverse but this is easily solved with the c(pattern1 = replacement1) option in stringr::str_replace_all.

tr <- data.frame("whatevs_1" = NA, "something_52" = NA)

tr
#>   whatevs_1 something_52
#> 1        NA           NA

patterns <- sprintf('_%s$', c('1','14','22','50','52','57','76','1018','2001','3301','6005'))
replacements <- sprintf('_%s' , c('R','I', 'P', 'O', 'C', 'D', 'M', 'L',   'S',   'K',   'G'))
                        
names(replacements) <- patterns

names(tr) <- stringr::str_replace_all(names(tr), replacements)

tr
#>   whatevs_R something_C
#> 1        NA          NA

And of course, this particular case can benefit from dplyr

dplyr::rename_all(tr, stringr::str_replace_all, replacements)
#>   whatevs_R something_C
#> 1        NA          NA



回答3:


Using do.call() nearly does it, it objects to differing arg lengths. I think I need to nest do.call() inside apply(), like in apply function to elements over a list.

But I need a partial do.call() over pattern and replace.

This is all starting to make a wall of gsub(..., fixed=TRUE) look like a more efficient idiom, if flabby code.

pattern <- paste('_', c('1','14','22','50'), '$', sep='')
replace <- paste('_', c('R','I', 'P', 'O'),       sep='')
do.call(gsub, list(pattern, replace, names(tr)))
Warning messages:
1: In function (pattern, replacement, x, ignore.case = FALSE, perl = FALSE,  :
  argument 'pattern' has length > 1 and only the first element will be used
2: In function (pattern, replacement, x, ignore.case = FALSE, perl = FALSE,  :
  argument 'replacement' has length > 1 and only the first element will be used


来源:https://stackoverflow.com/questions/10455318/perform-multiple-search-and-replaces-on-the-colnames-of-a-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!