问题
I have a list of five dataframes. Each dataframe contains one dimension column and 4 value columns. I would like to subset each dataframe in the list based on the contents of a vector.
df <- data.frame(x = 1:100, y2 = runif(100, 0, 100), y3 = runif(100, 0, 100), y4 = runif(100, 0, 100), y5 = runif(100,0,100))
df2 <- data.frame(x = 1:100, y2 = runif(100, 0, 100), y3 = runif(100, 0, 100), y4 = runif(100, 0, 100), y5 = runif(100,0,100))
df3 <- data.frame(x = 1:100, y2 = runif(100, 0, 100), y3 = runif(100, 0, 100), y4 = runif(100, 0, 100), y5 = runif(100,0,100))
df4 <- data.frame(x = 1:100, y2= runif(100, 0, 100), y4 = runif(100, 0, 100), y4 = runif(100, 0, 100), y5 = runif(100,0,100))
df5 <- data.frame(x = 1:100, y2= runif(100, 0, 100), y4 = runif(100, 0, 100), y4 = runif(100, 0, 100), y5 = runif(100,0,100))
frames <- list(df, df2, df3, df4, df5)
So in this example, my list is "frames". Let's say I have the following vector:
subs <- 50:60
My goal here would be to subset the list of dataframes such that each dataframe only contains rows where the value of the first colunmn is inside the subs vector.
Any advice?
Thanks, Ben
回答1:
It seems to me that almost all of your questions are regarding a list of data frames with same columns which cause you to use lapply
loops on every single operation (which seem highly inefficient).
Alternatively, you could vectorize most of your operations by simply binding all the lists into a single object while maintaining the ID of each data.frame and when finished with all the data manipulations, you could split them back into lists using split
.
Here's an example using data.table
s development version on Github (you could achieve similar results using dplyr::unnest
)
library(data.table)
Res <- rbindlist(frames, idcol = "ID")[x %between% subs]
# ID x y2 y3 y4 y5
# 1: 1 50 54.692889 58.51886 12.754368 35.61516
# 2: 1 51 21.206308 12.77442 52.440787 93.67734
# 3: 2 50 12.655685 84.55044 3.194644 54.46706
# 4: 2 51 83.840276 61.32614 61.139038 92.39402
# 5: 3 50 54.847797 20.68419 19.585931 48.87072
# 6: 3 51 75.510691 68.17955 98.696579 91.48688
# 7: 4 50 63.203071 95.94132 41.835923 60.68250
# 8: 4 51 75.481676 51.67619 80.393557 24.48381
# 9: 5 50 65.744847 50.36983 86.548843 83.31730
# 10: 5 51 4.956835 57.25666 27.106395 32.92020
Eventually (after finished with the all the data manipulations) you will just do
split(Res, Res$ID)
In order to get the data.frames back into lists
回答2:
You can try lapply
lapply(frames, function(.dat) .dat[with(.dat, x %in% subs),])
回答3:
If your first columns are all named x, you can use lapply on frames:
lapply(frames,function(p){p[p$x %in% subs,]})
来源:https://stackoverflow.com/questions/28195976/subset-a-dataframes-in-a-list-based-on-the-content-of-a-vector