Subset a dataframes in a list based on the content of a vector

*爱你&永不变心* 提交于 2019-12-13 04:16:22

问题


I have a list of five dataframes. Each dataframe contains one dimension column and 4 value columns. I would like to subset each dataframe in the list based on the contents of a vector.

df <- data.frame(x = 1:100, y2 = runif(100, 0, 100), y3 = runif(100, 0, 100), y4 = runif(100, 0, 100), y5 = runif(100,0,100))
df2 <- data.frame(x = 1:100, y2 = runif(100, 0, 100), y3 = runif(100, 0, 100), y4 = runif(100, 0, 100), y5 = runif(100,0,100))
df3 <- data.frame(x = 1:100, y2 = runif(100, 0, 100), y3 = runif(100, 0, 100), y4 = runif(100, 0, 100), y5 = runif(100,0,100))
df4 <- data.frame(x = 1:100, y2= runif(100, 0, 100), y4 = runif(100, 0, 100), y4 = runif(100, 0, 100), y5 = runif(100,0,100))
df5 <- data.frame(x = 1:100, y2= runif(100, 0, 100), y4 = runif(100, 0, 100), y4 = runif(100, 0, 100), y5 = runif(100,0,100))
frames <- list(df, df2, df3, df4, df5)

So in this example, my list is "frames". Let's say I have the following vector:

subs <- 50:60

My goal here would be to subset the list of dataframes such that each dataframe only contains rows where the value of the first colunmn is inside the subs vector.

Any advice?

Thanks, Ben


回答1:


It seems to me that almost all of your questions are regarding a list of data frames with same columns which cause you to use lapply loops on every single operation (which seem highly inefficient).

Alternatively, you could vectorize most of your operations by simply binding all the lists into a single object while maintaining the ID of each data.frame and when finished with all the data manipulations, you could split them back into lists using split.

Here's an example using data.tables development version on Github (you could achieve similar results using dplyr::unnest)

library(data.table)
Res <- rbindlist(frames, idcol = "ID")[x %between% subs]
#     ID  x        y2       y3        y4       y5
#  1:  1 50 54.692889 58.51886 12.754368 35.61516
#  2:  1 51 21.206308 12.77442 52.440787 93.67734
#  3:  2 50 12.655685 84.55044  3.194644 54.46706
#  4:  2 51 83.840276 61.32614 61.139038 92.39402
#  5:  3 50 54.847797 20.68419 19.585931 48.87072
#  6:  3 51 75.510691 68.17955 98.696579 91.48688
#  7:  4 50 63.203071 95.94132 41.835923 60.68250
#  8:  4 51 75.481676 51.67619 80.393557 24.48381
#  9:  5 50 65.744847 50.36983 86.548843 83.31730
# 10:  5 51  4.956835 57.25666 27.106395 32.92020

Eventually (after finished with the all the data manipulations) you will just do

split(Res, Res$ID)

In order to get the data.frames back into lists




回答2:


You can try lapply

lapply(frames, function(.dat) .dat[with(.dat, x %in% subs),])



回答3:


If your first columns are all named x, you can use lapply on frames:

lapply(frames,function(p){p[p$x %in% subs,]})    


来源:https://stackoverflow.com/questions/28195976/subset-a-dataframes-in-a-list-based-on-the-content-of-a-vector

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!