Combine list elements according to common dataframe value

问题

A followup to this question here, even though the example is specific, this seems like a generalizable application, so I think it's worth a separate thread:

The general question is: How do I take elements in a list that correspond to a value in an original data frame and combine them according to that value in the original data frame, especially when the elements of the list are of different length?

In this example, I have a dataframe that has two groups, each sorted by date. What I ultimately want to do is get a dataframe, organized by date, that has just the relevant metrics for each segment. If a certain segment doesn't have data for a certain date, it gets a 0.

Here's some actual data:

structure(list(date = structure(c(15706, 15707, 15708, 15709, 
15710, 15706, 15707, 15708), class = "Date"), segment = structure(c(1L, 
1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("abc", "xyz"), class = "factor"), 
    a = c(76L, 92L, 96L, 76L, 80L, 91L, 54L, 62L), x = c(964L, 
    505L, 968L, 564L, 725L, 929L, 748L, 932L), k = c(27L, 47L, 
    36L, 40L, 33L, 46L, 30L, 36L), value = c(6872L, 5993L, 5498L, 
    5287L, 6835L, 6622L, 5736L, 7218L)), .Names = c("date", "segment", 
"a", "x", "k", "value"), row.names = c(NA, -8L), class = "data.frame")

So for the "abc" segment, I JUST care about (value/a) relative to its benchmark of 75. and for the "xyz" segment, I JUST care about (k/x) relative to its benchmark of 0.04.

Ultimately I want a dataframe that looks like:

        date   abc   xyz
1 2013-01-01  0.21  0.24
2 2013-01-02 -0.13  0.00
3 2013-01-03 -0.24 -0.03
4 2013-01-04 -0.07  0.00
5 2013-01-05  0.14  0.00

Where, since "xyz" only had info for 2013-01-01 thru 2013-01-03, it gets 0's for everything after.

How I got to this point was:

define the arguments to be passed to mapply

splits <- split(test, test$segment)
metrics <- c("ametric","xmetric")
benchmarks <- c(75,0.04)

and the function to get performance against benchmark

performance <- function(splits,metrics,benchmarks){
    (splits[,metrics]/benchmarks)-1
}

Pass these to mapply:

temp <- mapply(performance, splits, metrics, benchmarks)

The problem now is that, since the splits were of different length, the output looks like this:

summary(temp)

    Length Class  Mode   
abc 5      -none- numeric
xyz 3      -none- numeric

Is there a way to bring in the dates from the original data frame for each split, and combine according to those dates (with 0's where there's no data)?

回答1:

You just need to set the SIMPLIFY=FALSE argument to mapply, then you can use do.call with rbind to put everything back into one dataframe:

> temp <- mapply(performance, splits, metrics, benchmarks)
> do.call('rbind',mapply(cbind, splits, performance=temp, SIMPLIFY=FALSE))
            date segment  a   x  k value  performance
abc.1 2013-01-01     abc 76 964 27  6872 1.333333e-02
abc.2 2013-01-02     abc 92 505 47  5993 2.266667e-01
abc.3 2013-01-03     abc 96 968 36  5498 2.800000e-01
abc.4 2013-01-04     abc 76 564 40  5287 1.333333e-02
abc.5 2013-01-05     abc 80 725 33  6835 6.666667e-02
xyz.6 2013-01-01     xyz 91 929 46  6622 2.322400e+04
xyz.7 2013-01-02     xyz 54 748 30  5736 1.869900e+04
xyz.8 2013-01-03     xyz 62 932 36  7218 2.329900e+04

来源：https://stackoverflow.com/questions/22138169/combine-list-elements-according-to-common-dataframe-value

标签

mapply