问题
I have two needs, both connected to a dataset similar to the reproducible one below. I have a list of 18 entities, each composed of a list of 17-19 data.frames. Reproducible dataset follows (there are matrices instead of data.frames, but I do not suppose that makes a difference):
test <- list(list(matrix(10:(50-1), ncol = 10), matrix(60:(100-1), ncol = 10), matrix(110:(150-1), ncol = 10)),
list(matrix(200:(500-1), ncol = 10), matrix(600:(1000-1), ncol = 10), matrix(1100:(1500-1), ncol = 10)))
- I need to subset each dataframe/matrix into two parts (by a given number of rows) and save to a new list of lists
- Secondly, I need to extract and save a given column(s) out of every
data.framein a list of lists.
I have no idea how to go around doing it apart from for(), but I am sure it should be possible with apply() family of functions.
Thank you for reading
EDIT:
My expected output would look as follows:
extractedColumns <- list(list(matrix(10:(50-1), ncol = 10)[, 2], matrix(60:(100-1), ncol = 10)[, 2], matrix(110:(150-1), ncol = 10)[, 2]),
list(matrix(200:(500-1), ncol = 10)[, 2], matrix(600:(1000-1), ncol = 10)[, 2], matrix(1100:(1500-1), ncol = 10)[, 2]))
numToSubset <- 3
substetFrames <- list(list(list(matrix(10:(50-1), ncol = 10)["first length - numToSubset rows", ], matrix(10:(50-1), ncol = 10)["last numToSubset rows", ]),
list(matrix(60:(100-1), ncol = 10)["first length - numToSubset rows", ], matrix(60:(100-1), ncol = 10)["last numToSubset rows", ]),
list(matrix(110:(150-1), ncol = 10)["first length - numToSubset rows", ], matrix(110:(150-1), ncol = 10)["last numToSubset rows", ])),
etc...)
It gets to look very messy, hope you can follow what I want.
回答1:
You can use two nested lapplys:
lapply(test, function(x) lapply(x, '[', c(2, 3)))
Ouput:
[[1]]
[[1]][[1]]
[1] 11 12
[[1]][[2]]
[1] 61 62
[[1]][[3]]
[1] 111 112
[[2]]
[[2]][[1]]
[1] 201 202
[[2]][[2]]
[1] 601 602
[[2]][[3]]
[1] 1101 1102
Explanation
The first lapply will be applied on the two lists of test. Each one of those two lists contain another 3. The second lapply will iterate over those 3 lists and subset (thats the '[' function in the second lapply) columns c(2, 3).
Note: In the case of a matrix [ will subset elements 2 and 3 but the same function will subset columns when used on a data.frame.
Subsetting rows and columns
lapply is very flexible with the use of anonymous functions. By changing the code into:
#change rows and columns into what you need
lapply(test, function(x) lapply(x, function(y) y[rows, columns]))
You can specify any combination of rows or columns you want.
回答2:
To piggyback @LyzandeR's answer, consider the often ignored sibling of the apply family, rapply that can recursively run functions on lists of vectors/matrices, returning such nested structures. Often it can compare to nested lapply or its variants v/sapply:
newtest1 <- lapply(test, function(x) lapply(x, '[', c(2, 3)))
newtest2 <- rapply(test, function(x) `[`(x, c(2, 3)), classes="matrix", how="list")
all.equal(newtest1, newtest2)
# [1] TRUE
Interestingly, to my amazement, rapply runs slower in this use case compared to nested lapply! Hmmmm, back to the lab I go...
library(microbenchmark)
microbenchmark(newtest1 <- lapply(test, function(x) lapply(x, '[', c(2, 3))))
# Unit: microseconds
# mean median uq max neval
# 31.92804 31.278 32.241 74.587 100
microbenchmark(newtest2 <- rapply(test, function(x) `[`(x, c(2, 3)),
classes="matrix", how="list"))
# Unit: microseconds
# min lq mean median uq max neval
# 69.293 72.18 79.53353 73.143 74.5865 219.91 100
Even more interesting, is removing the [ operator for the equivalent matrix bracketing, nested lapply runs even better and rapply even worse!
microbenchmark(newtest3 <- lapply(test, function(x)
lapply(x, function(y) y[c(2, 3), 1])))
# Unit: microseconds
# min lq mean median uq max neval
# 26.947 28.391 32.00987 29.354 30.798 100.09 100
all.equal(newtest1, newtest3)
# [1] TRUE
microbenchmark(newtest4 <- rapply(test, function(x) x[c(2,3), 1],
classes="matrix", how="list"))
# Unit: microseconds
# min lq mean median uq max neval
# 74.105 76.752 80.37076 77.955 78.918 203.549 100
all.equal(newtest2, newtest4)
# [1] TRUE
来源:https://stackoverflow.com/questions/41856001/r-extracting-information-from-list-of-lists-of-data-frames