recode/replace multiple values in a shared data column to a single value across data frames

放肆的年华 提交于 2019-12-05 16:00:27

Here is an option using tidyverse with recode_factor. When there are multiple elements to be changed, create a list of key/val elements and use recode_factor to match and change the values to new levels

library(tidyverse)
keyval <- setNames(rep(good_values, lengths(bad_values)), unlist(bad_values))
out <- map(df_list, ~ .x %>% 
                  mutate(grp = recode_factor(grp, !!! keyval)))

-output

out
#[[1]]
#   grp     measure
#1    a -1.63295876
#2    a  0.03859976
#3    a -0.46541610
#4    b -0.72356671
#5    b -1.11552841
#6    b  0.99352861
#....

#[[2]]
#   grp     measure
#1    a  1.26536789
#2    a -0.48189740
#3    a  0.23041056
#4    b -1.01324689
#5    a -1.41586086
#6    a  0.59026463
#....


#[[3]]
#  grp measure
#1   b       1
#2   b       2
#3   b       3
#4   b       4
#5   b       5
#6   a       6
#....

NOTE: This doesn't change the class of the initial dataset column

str(out)
#List of 3
# $ :'data.frame':  10 obs. of  2 variables:
#  ..$ grp    : Factor w/ 2 levels "a","b": 1 1 1 2 2 2 2 2 2 2
#  ..$ measure: num [1:10] -1.633 0.0386 -0.4654 -0.7236 -1.1155 ...
# $ :'data.frame':  26 obs. of  2 variables:
#  ..$ grp    : Factor w/ 2 levels "a","b": 1 1 1 2 1 1 1 1 1 1 ...
#  ..$ measure: num [1:26] 1.265 -0.482 0.23 -1.013 -1.416 ...
# $ :'data.frame':  9 obs. of  2 variables:
#  ..$ grp    : Factor w/ 2 levels "a","b": 2 2 2 2 2 1 1 1 1
#  ..$ measure: int [1:9] 1 2 3 4 5 6 7 8 9

Once we have a keyval pair list, this can be also used in base R functions

out1 <- lapply(df_list, transform, grp = unlist(keyval[grp]))

Any reason mapping a case_when statement wouldn't work?

library(tidyverse)
df_list %>% 
  map(~ mutate_if(.x, is.factor, as.character)) %>% # convert factor to character
  map(~ mutate(.x, grp = case_when(grp %in% bad_values[[1]] ~ good_values[[1]],
                                   grp %in% bad_values[[2]] ~ good_values[[2]],
                                   TRUE ~ grp)))

I could see it working for your reprex but possibly not the greater problem.

A base R option if you have lot of good_values and bad_values and it is not possible to check each one individually.

lapply(df_list, function(x) {
  vec = x[['grp']]
  mapply(function(p, q) vec[vec %in% p] <<- q ,bad_values, good_values)
  transform(x, grp = vec)
})


#[[1]]
#   grp      measure
#1    a -0.648146527
#2    a -0.004722549
#3    a -0.943451194
#4    b -0.709509396
#5    b -0.719434286
#....

#[[2]]
#   grp     measure
#1    a  1.03131291
#2    a -0.85558910
#3    a -0.05933911
#4    b  0.67812934
#5    a  3.23854093
#6    a  1.31688645
#7    a  1.87464048
#8    a  0.90100179
#....

#[[3]]
#  grp measure
#1   b       1
#2   b       2
#3   b       3
#4   b       4
#5   b       5
#....

Here, for every list element we extract it's grp column and replace bad_values with corresponding good_values if they are found and return the corrected dataframe.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!