问题
Suppose I have a data.frame
like THIS. Any columns of data after the column named autoreg
are arbitrary columns defined by the user. So, I won't know the columns names or values. For example, in THIS data.frame
columns named: "ESL" "prof" "scope" "type"
are defined by the user.
Question:
How can I have a looping structure (in BASE R) that at each round, extracts one set of each of these arbitrary columns? My desired output is a list within which the ESL
values prof
values scope
values and type
values from each study are put next to each other.
I have tried two nested lapply
(see below) which extracts all values for all sets of these arbitrary columns but how can I extract one set of each of these arbitrary columns at a time?
D <- read.csv("https://raw.githubusercontent.com/izeh/i/master/i.csv", h = T) ## data.frame
L <- split(D, D$study.name) ; L[[1]] <- NULL
arb.names <- c("ESL", "prof", "scope", "type") ## arbitrary column names
a <- lapply(1:length(arb.names), function(j) lapply(1:length(L), function(i) L[[i]][arb.names[j]]))
回答1:
May be we need to grep
the 'arb.names' to extract the set of columns from the 'L'
lapply(arb.names, function(nm) lapply(L, function(l1) l1[grep(nm, names(l1))]))
If we want to group the different names across the list
as a single list
, use transpose
library(purrr)
lapply(arb.names, function(nm) transpose(lapply(L, function(l1) l1[grep(nm, names(l1))])))
Or using base R
m1 <- simplify2array(lapply(arb.names, function(nm)
lapply(L, function(l1) l1[grep(nm, names(l1))])))
split(m1, col(m1))
回答2:
Although this question has an accepted answer I would like to propose a completely different approach.
If I understand correctly, the OP is looking for a way to easily compare the values in the arbitrary columns between the different studies. As additional complexity, the names of the arbitrary columns are not known beforehand.
My suggestion is to reshape the data appropriately:
library(data.table)
library(magrittr)
melt(setDT(D), id.vars = c("study.name", "group.name"),
measure.vars = tail(names(D), -grep("autoreg", names(D))), na.rm = TRUE) %>%
dcast(variable + study.name ~ group.name)
variable study.name Cont.Long Cont.Long2 Cont.Short DCF.Long DCF.Long2 DCF.Short ME.long ME.long2 ME.short 1: ESL Ellis.sh1 1 NA 1 1 NA 1 1 NA 1 2: ESL Goey1 0 NA 0 0 NA 0 0 NA 0 3: ESL kabla 1 1 1 1 1 1 1 1 1 4: prof Ellis.sh1 2 NA 2 2 NA 2 2 NA 2 5: prof Goey1 1 NA 1 1 NA 1 1 NA 1 6: prof kabla 3 3 3 3 3 3 3 3 3 7: scope Ellis.sh1 0 NA 0 0 NA 0 0 NA 0 8: scope Goey1 1 NA 1 1 NA 1 1 NA 1 9: scope kabla 0 0 0 0 0 0 0 0 0 10: type Ellis.sh1 1 NA 1 1 NA 1 1 NA 1 11: type Goey1 0 NA 0 0 NA 0 0 NA 0 12: type kabla 1 1 1 1 1 1 1 1 1
As arbitrary columns (column variable
in the reshaped format) all columns are picked from D
which appear after column autoreg
regardless of their names by
tail(names(D), -grep("autoreg", names(D)))
Addendum
Please note that the column names are taken from group.name
and have been ordered alphabetically. If it is an requirement to maintain the original row order in which group.name
did appear in D
then the factor levels of group.name
need to be adjusted accordingly:
library(data.table)
library(magrittr)
lvls <- D[study.name != "", 1:2] %>%
split(drop = TRUE, by = "study.name") %>%
.[lengths(.) %>% order() %>% rev()] %>% # merge longest first
Reduce(function(x, y) merge(x, y, by = "group.name", all = TRUE, sort = FALSE), .) %>%
.[, group.name %>% forcats::fct_drop() %>% forcats::fct_inorder()]
melt(setDT(D), id.vars = c("study.name", "group.name"),
measure.vars = tail(names(D), -grep("autoreg", names(D))), na.rm = TRUE) %>%
.[, group.name := factor(group.name, levels = lvls)] %>%
dcast(variable + study.name ~ group.name)
variable study.name ME.short ME.long ME.long2 DCF.Short DCF.Long DCF.Long2 Cont.Short Cont.Long Cont.Long2 1: ESL Ellis.sh1 1 1 NA 1 1 NA 1 1 NA 2: ESL Goey1 0 0 NA 0 0 NA 0 0 NA 3: ESL kabla 1 1 1 1 1 1 1 1 1 4: prof Ellis.sh1 2 2 NA 2 2 NA 2 2 NA 5: prof Goey1 1 1 NA 1 1 NA 1 1 NA 6: prof kabla 3 3 3 3 3 3 3 3 3 7: scope Ellis.sh1 0 0 NA 0 0 NA 0 0 NA 8: scope Goey1 1 1 NA 1 1 NA 1 1 NA 9: scope kabla 0 0 0 0 0 0 0 0 0 10: type Ellis.sh1 1 1 NA 1 1 NA 1 1 NA 11: type Goey1 0 0 NA 0 0 NA 0 0 NA 12: type kabla 1 1 1 1 1 1 1 1 1
Data
As external links may break in the future, here is OP's dataset from the github link:
D <-
structure(list(study.name = structure(c(2L, 2L, 2L, 2L, 2L, 2L,
1L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L), .Label = c("", "Ellis.sh1", "Goey1", "kabla"), class = "factor"),
group.name = structure(c(10L, 8L, 7L, 5L, 4L, 2L, 1L, 10L,
8L, 7L, 5L, 4L, 2L, 1L, 10L, 8L, 9L, 7L, 5L, 6L, 4L, 2L,
3L), .Label = c("", "Cont.Long", "Cont.Long2", "Cont.Short",
"DCF.Long", "DCF.Long2", "DCF.Short", "ME.long", "ME.long2",
"ME.short"), class = "factor"), n = c(13L, 13L, 15L, 15L,
16L, 16L, NA, 13L, 13L, 15L, 15L, 16L, 16L, NA, 13L, 13L,
13L, 15L, 15L, 15L, 16L, 16L, 16L), mpre = c(0.34, 0.34,
0.37, 0.37, 0.32, 0.32, NA, 0.34, 0.34, 0.37, 0.37, 0.32,
0.32, NA, 0.34, 0.34, 0.34, 0.37, 0.37, 0.37, 0.32, 0.32,
0.32), mpos = c(0.72, 0.39, 0.54, 0.49, 0.28, 0.35, NA, 0.72,
0.39, 0.54, 0.49, 0.28, 0.35, NA, 0.72, 0.39, 0.39, 0.54,
0.49, 0.49, 0.28, 0.35, 0.35), sdpre = c(0.37, 0.37, 0.38,
0.38, 0.37, 0.37, NA, 0.37, 0.37, 0.38, 0.38, 0.37, 0.37,
NA, 0.37, 0.37, 0.37, 0.38, 0.38, 0.38, 0.37, 0.37, 0.37),
sdpos = c(0.34, 0.36, 0.36, 0.36, 0.36, 0.32, NA, 0.34, 0.36,
0.36, 0.36, 0.36, 0.32, NA, 0.34, 0.36, 0.36, 0.36, 0.36,
0.36, 0.36, 0.32, 0.32), control = c(FALSE, FALSE, FALSE,
FALSE, TRUE, TRUE, NA, FALSE, FALSE, FALSE, FALSE, TRUE,
TRUE, NA, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE,
TRUE, TRUE), post = c(1L, 2L, 1L, 2L, 1L, 2L, NA, 1L, 2L,
1L, 2L, 1L, 2L, NA, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L),
r = c(0.5, 0.5, 0.5, 0.5, 0.5, 0.5, NA, 0.5, 0.5, 0.5, 0.5,
0.5, 0.5, NA, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5
), autoreg = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
NA, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, NA, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE),
ESL = c(1L, 1L, 1L, 1L, 1L, 1L, NA, 0L, 0L, 0L, 0L, 0L, 0L,
NA, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), prof = c(2L, 2L,
2L, 2L, 2L, 2L, NA, 1L, 1L, 1L, 1L, 1L, 1L, NA, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L), scope = c(0L, 0L, 0L, 0L, 0L, 0L,
NA, 1L, 1L, 1L, 1L, 1L, 1L, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L), type = c(1L, 1L, 1L, 1L, 1L, 1L, NA, 0L, 0L, 0L,
0L, 0L, 0L, NA, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA,
-23L))
study.name group.name n mpre mpos sdpre sdpos control post r autoreg ESL prof scope type 1: Ellis.sh1 ME.short 13 0.34 0.72 0.37 0.34 FALSE 1 0.5 FALSE 1 2 0 1 2: Ellis.sh1 ME.long 13 0.34 0.39 0.37 0.36 FALSE 2 0.5 FALSE 1 2 0 1 3: Ellis.sh1 DCF.Short 15 0.37 0.54 0.38 0.36 FALSE 1 0.5 FALSE 1 2 0 1 4: Ellis.sh1 DCF.Long 15 0.37 0.49 0.38 0.36 FALSE 2 0.5 FALSE 1 2 0 1 5: Ellis.sh1 Cont.Short 16 0.32 0.28 0.37 0.36 TRUE 1 0.5 FALSE 1 2 0 1 6: Ellis.sh1 Cont.Long 16 0.32 0.35 0.37 0.32 TRUE 2 0.5 FALSE 1 2 0 1 7: NA NA NA NA NA NA NA NA NA NA NA NA NA 8: Goey1 ME.short 13 0.34 0.72 0.37 0.34 FALSE 1 0.5 FALSE 0 1 1 0 9: Goey1 ME.long 13 0.34 0.39 0.37 0.36 FALSE 2 0.5 FALSE 0 1 1 0 10: Goey1 DCF.Short 15 0.37 0.54 0.38 0.36 FALSE 1 0.5 FALSE 0 1 1 0 11: Goey1 DCF.Long 15 0.37 0.49 0.38 0.36 FALSE 2 0.5 FALSE 0 1 1 0 12: Goey1 Cont.Short 16 0.32 0.28 0.37 0.36 TRUE 1 0.5 FALSE 0 1 1 0 13: Goey1 Cont.Long 16 0.32 0.35 0.37 0.32 TRUE 2 0.5 FALSE 0 1 1 0 14: NA NA NA NA NA NA NA NA NA NA NA NA NA 15: kabla ME.short 13 0.34 0.72 0.37 0.34 FALSE 1 0.5 FALSE 1 3 0 1 16: kabla ME.long 13 0.34 0.39 0.37 0.36 FALSE 2 0.5 FALSE 1 3 0 1 17: kabla ME.long2 13 0.34 0.39 0.37 0.36 FALSE 3 0.5 FALSE 1 3 0 1 18: kabla DCF.Short 15 0.37 0.54 0.38 0.36 FALSE 1 0.5 FALSE 1 3 0 1 19: kabla DCF.Long 15 0.37 0.49 0.38 0.36 FALSE 2 0.5 FALSE 1 3 0 1 20: kabla DCF.Long2 15 0.37 0.49 0.38 0.36 FALSE 3 0.5 FALSE 1 3 0 1 21: kabla Cont.Short 16 0.32 0.28 0.37 0.36 TRUE 1 0.5 FALSE 1 3 0 1 22: kabla Cont.Long 16 0.32 0.35 0.37 0.32 TRUE 2 0.5 FALSE 1 3 0 1 23: kabla Cont.Long2 16 0.32 0.35 0.37 0.32 TRUE 3 0.5 FALSE 1 3 0 1 study.name group.name n mpre mpos sdpre sdpos control post r autoreg ESL prof scope type
来源:https://stackoverflow.com/questions/56705711/extracting-one-set-of-multiple-variables-in-a-list-of-data-frames-in-r