Calculate e.g. a mean in a list with multi-column data.frames

问题

I have a list of several data.frames. Each data.frame has several columns. By using mean(mylist$first_dataframe$a I can get the mean for a in this one data.frame. However I do not know how to calculate over all the data.frames stored in my list or how for specific data.frames.

I could use a loop but I was told that apply() and its variations are better I tried using several solutions I found via search but somehow it just doesn't work. I assume I need to use

unlist()

Could you provide an example of how to calculate e.g. a mean for a data structure like mine. A list with several data.frames containing several columns.

Update: I'm sorry for the confusion. I wanted the grand mean for a specific column in all dataframes. Thanks to Thomas for providing a working solution for calculating a grand mean for a specific column in all dataframes and to psychometriko for providing a useful solution for calculating means over all columns in all dataframes (& even for the case when not numeric data is involved).

Thanks!

回答1:

Is this what you are looking for?

set.seed(42)
mylist <- list(a=data.frame(foo=rnorm(10),
                            bar=rnorm(10)),
               b=data.frame(foo=rnorm(10),
                            bar=rnorm(10)),
               c=data.frame(foo=rnorm(10),
                            bar=rnorm(10)))
sapply(do.call("rbind",mylist),mean)

       foo        bar 
 0.1163340 -0.1696556

Note: do.call("rbind",mylist) returns something similar to what you referred to above with the unlist function, and then sapply, as referred to by Roland in his answer, just calls the function mean on each component (column) of the data.frame that results from the above do.call function.

Edit: In response to the question of how to deal with non-numeric data.frame components, the below solution admittedly isn't very elegant and I'm sure better ones exist, but here's the first thing I was able to think of:

set.seed(42)
mylist <- list(a=data.frame(rand=rnorm(10),
                            lets=sample(LETTERS,10,replace=TRUE)),
               b=data.frame(rand=rnorm(10),
                            lets=sample(LETTERS,10,replace=TRUE)),
               c=data.frame(rand=rnorm(10),
                            lets=sample(LETTERS,10,replace=TRUE)))
sapply(do.call("rbind",mylist),function(x) {
  if (is.numeric(x)) mean(x)
})

$rand
[1] -0.02470602

$lets
NULL

This basically just creates a custom function that first tests whether each component is numeric and, if it is, returns the mean. If it isn't, it skips it.

回答2:

The whole do.call('rbind', List) thing can be quite slow and prone to mishaps. If there is only one column you need the mean for, the best way is:

mean(sapply(mylist, function(X) X$rand))

It's about 10x faster the the do.call method.

来源：https://stackoverflow.com/questions/17146523/calculate-e-g-a-mean-in-a-list-with-multi-column-data-frames

标签

list

dataframe