How to apply functions in columns for data frames with different sizes in nested list?

问题

In R, to apply some function to a column, you can do:

df$col <- someFunction(df$col)

Now my question is, how do you the similar task when you have data frames in a nested list? Say I have a following list like this, where I have data frames in the second level from the root.

                                           +------+------+
                                  type1    | id   | name |
                              +----------->|------|------|
                              |            |      |      |
                              |            |      |      |
                year1         |            +------+------+
           +------------------+
           |                  |
           |                  |            +------+------+-----+
           |                  |  type2     | meta1|meta2 | name|
           |                  +----------> |------|------|-----|
           |                               |      |      |     |
           +                               +------+------+-----+
           |                     type1    +------+------+
           |                  +---------> | id   |name  |
           |                  |           |------|------|
           |     year2        |           |      |      |
   list    +----------------->+           |      |      |
           +                  |           +------+------+
           |                  |  type2     +------+------+-----+
           |                  +--------->  | meta1|meta2 |name |
           |                               |------|------|-----|
           |                               |      |      |     |
           |                    type1      +------+------+-----+
           |                 +---------->  +------+------+
           |                 |             | id   |name  |
           |     year3       |             |------|------|
           +-----------------+             |      |      |
                             |             |      |      |
                             |  type2      +------+------+
                             +---------->  +------+------+-----+
                                           |meta1 | meta2|name |
                                           |------|------|-----|
                                           |      |      |     |
                                           +------+------+-----+

And I want to modify the "name" column in each of the data frame in the leaves with some functions and store the results there. How do you do that?

Here is the example data:

data<-list()

data$yr2001$type1 <- df_2001_1 <- data.frame(index=1:3,name=c("jack","king","larry"))
data$yr2001$type2 <- df_2001_2 <- data.frame(index=1:5,name=c("man","women","oliver","jack","jill"))
data$yr2002$type1 <- df_2002_1 <- data.frame(index=1:3,name=c("janet","king","larry"))
data$yr2002$type2 <- df_2002_2 <- data.frame(index=1:5,name=c("alboyr","king","larry","rachel","sam"))
data$yr2003$type1 <- df_2003_1 <- data.frame(index=1:3,name=c("dan","jay","zang"))
data$yr2003$type2 <- df_2003_2 <- data.frame(index=1:5,name=c("zang","king","larry","kim","fran"))

say I want to uppercase all of the names in in the name column in each data frame stored in the list

回答1:

To illustrate (using your simplified example):

library(reshape2)
dat1 <- melt(data,id.vars = c("index","name"))
> dat1$NAME <- toupper(dat1$name)

回答2:

I agree with @joran's comment above---this is begging to be consolidated by adding type as a column. But here is one way with rapply. This assumes that the name column is the only factor column in each nested data.frame. As in @josilber's answer, my function of choice is toupper.

rapply(data, function(x) toupper(as.character(x)), classes='factor', how='replace')

This will drop the data.frame class, but the essential structure is preserved. If your name columns are already character, then you would use.

rapply(data, toupper, classes='character', how='replace')

回答3:

You can nest the lapply function twice to get at the inner data frames. Here, I apply toupper to each name variable:

result <- lapply(data, function(x) {
  lapply(x, function(y) {
    y$name = toupper(y$name)
    return(y)
  })
})
result

# $yr2001
# $yr2001$type1
#   index  name
# 1     1  JACK
# 2     2  KING
# 3     3 LARRY
# 
# $yr2001$type2
#   index   name
# 1     1    MAN
# 2     2  WOMEN
# 3     3 OLIVER
# 4     4   JACK
# 5     5   JILL
# 
# 
# $yr2002
# $yr2002$type1
#   index  name
# 1     1 JANET
# 2     2  KING
# 3     3 LARRY
# 
# $yr2002$type2
#   index   name
# 1     1 ALBOYR
# 2     2   KING
# 3     3  LARRY
# 4     4 RACHEL
# 5     5    SAM
# 
# 
# $yr2003
# $yr2003$type1
#   index name
# 1     1  DAN
# 2     2  JAY
# 3     3 ZANG
# 
# $yr2003$type2
#   index  name
# 1     1  ZANG
# 2     2  KING
# 3     3 LARRY
# 4     4   KIM
# 5     5  FRAN

回答4:

Here is a truly recursive version based on lapply (i.e. will work with deeper nesting) and doesn't make any other assumptions except that the only types of terminal leaves you have are data frames. Unfortunately rapply won't stop the recursion at data.frames so you have to use lapply if you want to operate on the data frames (otherwise Matthew's answer is perfect).

samp.recur <- function(x) 
  lapply(x, 
    function(y) 
      if(is.data.frame(y)) transform(y, name=toupper(name)) else samp.recur(y))

This produces:

samp.recur(data)
# $yr2001
# $yr2001$type1
#   index  name
# 1     1  JACK
# 2     2  KING
# 3     3 LARRY

# $yr2001$type2
#   index   name
# 1     1    MAN
# 2     2  WOMEN
# 3     3 OLIVER
# 4     4   JACK
# 5     5   JILL

# etc...

Though I do also agree with others you may want to consider re-structuring your data.

来源：https://stackoverflow.com/questions/22000025/how-to-apply-functions-in-columns-for-data-frames-with-different-sizes-in-nested

标签

nested

plyr