R dplyr: rename variables using string functions

匿名 (未验证) 提交于 2019-12-03 03:03:02

问题:

(Somewhat related question: Enter new column names as string in dplyr's rename function)

In the middle of a dplyr chain (%>%), I would like to replace multiple column names with functions of their old names (using tolower or gsub, etc.)

library(tidyr); library(dplyr) data(iris) # This is what I want to do, but I'd like to use dplyr syntax names(iris) % gather(measurement, value, -species) %>%   group_by(species,measurement) %>%   summarise(avg_value = mean(value))  

I see ?rename takes the argument replace as a named character vector, with new names as values, and old names as names.

So I tried:

iris %>% rename(replace=c(names(iris)=tolower( gsub("\\.", "_", names(iris) ) )  )) 

but this (a) returns Error: unexpected '=' in iris %>% ... and (b) requires referencing by name the data frame from the previous operation in the chain, which in my real use case I couldn't do.

iris %>%    rename(replace=c(    )) %>% # ideally the fix would go here   gather(measurement, value, -species) %>%   group_by(species,measurement) %>%   summarise(avg_value = mean(value)) # I realize I could mutate down here                                       #  instead, once the column names turn into values,                                       #  but that's not the point # ---- Desired output looks like: ------- # Source: local data frame [12 x 3] # Groups: species #  #       species  measurement avg_value # 1      setosa sepal_length     5.006 # 2      setosa  sepal_width     3.428 # 3      setosa petal_length     1.462 # 4      setosa  petal_width     0.246 # 5  versicolor sepal_length     5.936 # 6  versicolor  sepal_width     2.770 # ... etc ....   

回答1:

I think you're looking at the documentation for plyr::rename, not dplyr::rename. You would do something like this with dplyr::rename:

iris %>% rename_(.dots=setNames(names(.), tolower(gsub("\\.", "_", names(.))))) 


回答2:

This is a very late answer, on May 2017

As of dplyr 0.5.0.9004, soon to be 0.6.0, many new ways of renaming columns, compliant with the maggritr pipe operator %>%, have been added to the package.

Those functions are:

  • rename_all
  • rename_if
  • rename_at

There are many different ways of using those functions, but the one relevant to your problem, using the stringr package is the following:

df %   rename_all(       funs(         stringr::str_to_lower(.) %>%         stringr::str_replace_all(., '\\.', '_')       )   ) 

And so, carry on with the plumbing :) (no pun intended).



回答3:

Here's a way around the somewhat awkward rename syntax:

myris % setNames(tolower(gsub("\\.","_",names(.)))) 


回答4:

For this particular [but fairly common] case, the function has already been written in the janitor package:

library(janitor)  iris %>% clean_names()  ##   sepal_length sepal_width petal_length petal_width species ## 1          5.1         3.5          1.4         0.2  setosa ## 2          4.9         3.0          1.4         0.2  setosa ## 3          4.7         3.2          1.3         0.2  setosa ## 4          4.6         3.1          1.5         0.2  setosa ## 5          5.0         3.6          1.4         0.2  setosa ## 6          5.4         3.9          1.7         0.4  setosa ## .          ...         ...          ...         ...     ... 

so all together,

iris %>%      clean_names() %>%     gather(measurement, value, -species) %>%     group_by(species,measurement) %>%     summarise(avg_value = mean(value))  ## Source: local data frame [12 x 3] ## Groups: species [?] ##  ##       species  measurement avg_value ##         ## 1      setosa petal_length     1.462 ## 2      setosa  petal_width     0.246 ## 3      setosa sepal_length     5.006 ## 4      setosa  sepal_width     3.428 ## 5  versicolor petal_length     4.260 ## 6  versicolor  petal_width     1.326 ## 7  versicolor sepal_length     5.936 ## 8  versicolor  sepal_width     2.770 ## 9   virginica petal_length     5.552 ## 10  virginica  petal_width     2.026 ## 11  virginica sepal_length     6.588 ## 12  virginica  sepal_width     2.974 


回答5:

My eloquent attempt using base, stringr and dplyr:

EDIT: library(tidyverse) now includes all three libraries.

library(tidyverse)      # OR # library(dplyr) # library(stringr) # library(maggritr)  names(iris) %% # pipes so that changes are apply the changes back     tolower() %>%     str_replace_all(".", "_") 

I do this for building functions with piping.

my_read_fun %     names(df) %%         tolower() %>%         str_replace_all("_", ".")     tempdf %%         select(a, b, c, g) } 


回答6:

Both select() and select_all() can be used to rename columns.

If you wanted to rename only specific columns you can use select:

iris %>%    select(sepal_length = Sepal.Length, sepal_width = Sepal.Width, everything()) %>%    head(2)    sepal_length sepal_width Petal.Length Petal.Width Species 1          5.1         3.5          1.4         0.2  setosa 2          4.9         3.0          1.4         0.2  setosa 

rename does the same thing, just without having to include everything():

iris %>%    rename(sepal_length = Sepal.Length, sepal_width = Sepal.Width) %>%    head(2)    sepal_length sepal_width Petal.Length Petal.Width Species 1          5.1         3.5          1.4         0.2  setosa 2          4.9         3.0          1.4         0.2  setosa 

select_all() works on all columns and can take a function as an argument:

iris %>%    select_all(tolower)  iris %>%    select_all(~gsub("\\.", "_", .))  

or combining the two:

iris %>%    select_all(~gsub("\\.", "_", tolower(.))) %>%    head(2)    sepal_length sepal_width petal_length petal_width species 1          5.1         3.5          1.4         0.2  setosa 2          4.9         3.0          1.4         0.2  setosa 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!