问题
In R, to apply some function to a column, you can do:
df$col <- someFunction(df$col)
Now my question is, how do you the similar task when you have data frames in a nested list? Say I have a following list like this, where I have data frames in the second level from the root.
+------+------+
type1 | id | name |
+----------->|------|------|
| | | |
| | | |
year1 | +------+------+
+------------------+
| |
| | +------+------+-----+
| | type2 | meta1|meta2 | name|
| +----------> |------|------|-----|
| | | | |
+ +------+------+-----+
| type1 +------+------+
| +---------> | id |name |
| | |------|------|
| year2 | | | |
list +----------------->+ | | |
+ | +------+------+
| | type2 +------+------+-----+
| +---------> | meta1|meta2 |name |
| |------|------|-----|
| | | | |
| type1 +------+------+-----+
| +----------> +------+------+
| | | id |name |
| year3 | |------|------|
+-----------------+ | | |
| | | |
| type2 +------+------+
+----------> +------+------+-----+
|meta1 | meta2|name |
|------|------|-----|
| | | |
+------+------+-----+
And I want to modify the "name" column in each of the data frame in the leaves with some functions and store the results there. How do you do that?
Here is the example data:
data<-list()
data$yr2001$type1 <- df_2001_1 <- data.frame(index=1:3,name=c("jack","king","larry"))
data$yr2001$type2 <- df_2001_2 <- data.frame(index=1:5,name=c("man","women","oliver","jack","jill"))
data$yr2002$type1 <- df_2002_1 <- data.frame(index=1:3,name=c("janet","king","larry"))
data$yr2002$type2 <- df_2002_2 <- data.frame(index=1:5,name=c("alboyr","king","larry","rachel","sam"))
data$yr2003$type1 <- df_2003_1 <- data.frame(index=1:3,name=c("dan","jay","zang"))
data$yr2003$type2 <- df_2003_2 <- data.frame(index=1:5,name=c("zang","king","larry","kim","fran"))
say I want to uppercase all of the names in in the name column in each data frame stored in the list
回答1:
To illustrate (using your simplified example):
library(reshape2)
dat1 <- melt(data,id.vars = c("index","name"))
> dat1$NAME <- toupper(dat1$name)
回答2:
I agree with @joran's comment above---this is begging to be consolidated by adding type as a column. But here is one way with rapply
. This assumes that the name
column is the only factor
column in each nested data.frame. As in @josilber's answer, my function of choice is toupper
.
rapply(data, function(x) toupper(as.character(x)), classes='factor', how='replace')
This will drop the data.frame
class, but the essential structure is preserved. If your name columns are already character
, then you would use.
rapply(data, toupper, classes='character', how='replace')
回答3:
You can nest the lapply
function twice to get at the inner data frames. Here, I apply toupper
to each name
variable:
result <- lapply(data, function(x) {
lapply(x, function(y) {
y$name = toupper(y$name)
return(y)
})
})
result
# $yr2001
# $yr2001$type1
# index name
# 1 1 JACK
# 2 2 KING
# 3 3 LARRY
#
# $yr2001$type2
# index name
# 1 1 MAN
# 2 2 WOMEN
# 3 3 OLIVER
# 4 4 JACK
# 5 5 JILL
#
#
# $yr2002
# $yr2002$type1
# index name
# 1 1 JANET
# 2 2 KING
# 3 3 LARRY
#
# $yr2002$type2
# index name
# 1 1 ALBOYR
# 2 2 KING
# 3 3 LARRY
# 4 4 RACHEL
# 5 5 SAM
#
#
# $yr2003
# $yr2003$type1
# index name
# 1 1 DAN
# 2 2 JAY
# 3 3 ZANG
#
# $yr2003$type2
# index name
# 1 1 ZANG
# 2 2 KING
# 3 3 LARRY
# 4 4 KIM
# 5 5 FRAN
回答4:
Here is a truly recursive version based on lapply
(i.e. will work with deeper nesting) and doesn't make any other assumptions except that the only types of terminal leaves you have are data frames. Unfortunately rapply
won't stop the recursion at data.frames so you have to use lapply
if you want to operate on the data frames (otherwise Matthew's answer is perfect).
samp.recur <- function(x)
lapply(x,
function(y)
if(is.data.frame(y)) transform(y, name=toupper(name)) else samp.recur(y))
This produces:
samp.recur(data)
# $yr2001
# $yr2001$type1
# index name
# 1 1 JACK
# 2 2 KING
# 3 3 LARRY
# $yr2001$type2
# index name
# 1 1 MAN
# 2 2 WOMEN
# 3 3 OLIVER
# 4 4 JACK
# 5 5 JILL
# etc...
Though I do also agree with others you may want to consider re-structuring your data.
来源:https://stackoverflow.com/questions/22000025/how-to-apply-functions-in-columns-for-data-frames-with-different-sizes-in-nested