问题
A followup to this question here, even though the example is specific, this seems like a generalizable application, so I think it's worth a separate thread:
The general question is: How do I take elements in a list that correspond to a value in an original data frame and combine them according to that value in the original data frame, especially when the elements of the list are of different length?
In this example, I have a dataframe that has two groups, each sorted by date. What I ultimately want to do is get a dataframe, organized by date, that has just the relevant metrics for each segment. If a certain segment doesn't have data for a certain date, it gets a 0.
Here's some actual data:
structure(list(date = structure(c(15706, 15707, 15708, 15709,
15710, 15706, 15707, 15708), class = "Date"), segment = structure(c(1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("abc", "xyz"), class = "factor"),
a = c(76L, 92L, 96L, 76L, 80L, 91L, 54L, 62L), x = c(964L,
505L, 968L, 564L, 725L, 929L, 748L, 932L), k = c(27L, 47L,
36L, 40L, 33L, 46L, 30L, 36L), value = c(6872L, 5993L, 5498L,
5287L, 6835L, 6622L, 5736L, 7218L)), .Names = c("date", "segment",
"a", "x", "k", "value"), row.names = c(NA, -8L), class = "data.frame")
So for the "abc" segment, I JUST care about (value/a) relative to its benchmark of 75. and for the "xyz" segment, I JUST care about (k/x) relative to its benchmark of 0.04.
Ultimately I want a dataframe that looks like:
date abc xyz
1 2013-01-01 0.21 0.24
2 2013-01-02 -0.13 0.00
3 2013-01-03 -0.24 -0.03
4 2013-01-04 -0.07 0.00
5 2013-01-05 0.14 0.00
Where, since "xyz" only had info for 2013-01-01 thru 2013-01-03, it gets 0's for everything after.
How I got to this point was:
define the arguments to be passed to mapply
splits <- split(test, test$segment)
metrics <- c("ametric","xmetric")
benchmarks <- c(75,0.04)
and the function to get performance against benchmark
performance <- function(splits,metrics,benchmarks){
(splits[,metrics]/benchmarks)-1
}
Pass these to mapply:
temp <- mapply(performance, splits, metrics, benchmarks)
The problem now is that, since the splits were of different length, the output looks like this:
summary(temp)
Length Class Mode
abc 5 -none- numeric
xyz 3 -none- numeric
Is there a way to bring in the dates from the original data frame for each split, and combine according to those dates (with 0's where there's no data)?
回答1:
You just need to set the SIMPLIFY=FALSE
argument to mapply
, then you can use do.call
with rbind
to put everything back into one dataframe:
> temp <- mapply(performance, splits, metrics, benchmarks)
> do.call('rbind',mapply(cbind, splits, performance=temp, SIMPLIFY=FALSE))
date segment a x k value performance
abc.1 2013-01-01 abc 76 964 27 6872 1.333333e-02
abc.2 2013-01-02 abc 92 505 47 5993 2.266667e-01
abc.3 2013-01-03 abc 96 968 36 5498 2.800000e-01
abc.4 2013-01-04 abc 76 564 40 5287 1.333333e-02
abc.5 2013-01-05 abc 80 725 33 6835 6.666667e-02
xyz.6 2013-01-01 xyz 91 929 46 6622 2.322400e+04
xyz.7 2013-01-02 xyz 54 748 30 5736 1.869900e+04
xyz.8 2013-01-03 xyz 62 932 36 7218 2.329900e+04
来源:https://stackoverflow.com/questions/22138169/combine-list-elements-according-to-common-dataframe-value