Subset a dataframes in a list based on the content of a vector

问题

I have a list of five dataframes. Each dataframe contains one dimension column and 4 value columns. I would like to subset each dataframe in the list based on the contents of a vector.

df <- data.frame(x = 1:100, y2 = runif(100, 0, 100), y3 = runif(100, 0, 100), y4 = runif(100, 0, 100), y5 = runif(100,0,100))
df2 <- data.frame(x = 1:100, y2 = runif(100, 0, 100), y3 = runif(100, 0, 100), y4 = runif(100, 0, 100), y5 = runif(100,0,100))
df3 <- data.frame(x = 1:100, y2 = runif(100, 0, 100), y3 = runif(100, 0, 100), y4 = runif(100, 0, 100), y5 = runif(100,0,100))
df4 <- data.frame(x = 1:100, y2= runif(100, 0, 100), y4 = runif(100, 0, 100), y4 = runif(100, 0, 100), y5 = runif(100,0,100))
df5 <- data.frame(x = 1:100, y2= runif(100, 0, 100), y4 = runif(100, 0, 100), y4 = runif(100, 0, 100), y5 = runif(100,0,100))
frames <- list(df, df2, df3, df4, df5)

So in this example, my list is "frames". Let's say I have the following vector:

subs <- 50:60

My goal here would be to subset the list of dataframes such that each dataframe only contains rows where the value of the first colunmn is inside the subs vector.

Any advice?

Thanks, Ben

回答1:

It seems to me that almost all of your questions are regarding a list of data frames with same columns which cause you to use lapply loops on every single operation (which seem highly inefficient).

Alternatively, you could vectorize most of your operations by simply binding all the lists into a single object while maintaining the ID of each data.frame and when finished with all the data manipulations, you could split them back into lists using split.

Here's an example using data.tables development version on Github (you could achieve similar results using dplyr::unnest)

library(data.table)
Res <- rbindlist(frames, idcol = "ID")[x %between% subs]
#     ID  x        y2       y3        y4       y5
#  1:  1 50 54.692889 58.51886 12.754368 35.61516
#  2:  1 51 21.206308 12.77442 52.440787 93.67734
#  3:  2 50 12.655685 84.55044  3.194644 54.46706
#  4:  2 51 83.840276 61.32614 61.139038 92.39402
#  5:  3 50 54.847797 20.68419 19.585931 48.87072
#  6:  3 51 75.510691 68.17955 98.696579 91.48688
#  7:  4 50 63.203071 95.94132 41.835923 60.68250
#  8:  4 51 75.481676 51.67619 80.393557 24.48381
#  9:  5 50 65.744847 50.36983 86.548843 83.31730
# 10:  5 51  4.956835 57.25666 27.106395 32.92020

Eventually (after finished with the all the data manipulations) you will just do

split(Res, Res$ID)

In order to get the data.frames back into lists

回答2:

You can try lapply

lapply(frames, function(.dat) .dat[with(.dat, x %in% subs),])

回答3:

If your first columns are all named x, you can use lapply on frames:

lapply(frames,function(p){p[p$x %in% subs,]})

来源：https://stackoverflow.com/questions/28195976/subset-a-dataframes-in-a-list-based-on-the-content-of-a-vector

标签

list

subset