问题
In data.frame DATA, I have some columns that are constant numbers across the unique rows of the first column called study.name. For example, columns ESL and prof are constant for all rows of Shin.Ellis and constant for all rows of Trus.Hsu and so on. Including Shin.Ellis and Trus.Hsu, there are 8 unique study.name rows.
BUT after my split.default() call below, how can I obtain only one data-point for all rows under a unique study.name (e.g., one for Shin.Ellis, one for Trus.Hsu etc.) for such constant variables? (i.e., 8 rows overall)
For example, after my split.default(), all variables named ESL show only have 8 rows each for a unique study.name.
My desired output for ONLY ESL and prof is shown further below.
NOTE: This is toy data. We first should find constant variables. A functional answer is highly appreciated.
DATA <- read.csv("https://raw.githubusercontent.com/izeh/m/master/irr.csv", h = T)[-(2:3)]
DATA <- setNames(DATA, sub("\\.\\d+$", "", names(DATA)))
tbl <- table(names(DATA))
nm2 <- names(which(tbl==max(tbl)))
L <- split.default(DATA[names(DATA) %in% nm2], names(DATA)[names(DATA) %in% nm2])
## FIRST 8 ROWS of `DATA`:
# study.name ESL prof scope type ESL prof scope type
# 1 Shin.Ellis 1 2 1 1 1 2 1 1
# 2 Shin.Ellis 1 2 1 1 1 2 1 1
# 3 Shin.Ellis 1 2 1 2 1 2 1 1
# 4 Shin.Ellis 1 2 1 2 1 2 1 1
# 5 Shin.Ellis 1 2 NA NA 1 2 NA NA
# 6 Shin.Ellis 1 2 NA NA 1 2 NA NA
# 7 Trus.Hsu 2 2 2 1 2 2 1 1
# 8 Trus.Hsu 2 2 NA NA 2 2 NA NA
# . ... . . . . . . . . # `DATA` has 54 rows overall
Desired output for ESL and prof after split.default() call:
# $ESL ## 8 unique rows for 8 unique `study.name`
# ESL ESL.1
# 1 1 1
# 7 2 2
# 9 1 1
# 17 1 1
# 23 1 1
# 35 1 1
# 37 2 2
# 49 2 2
# $prof ## 8 unique rows for 8 unique `study.name`
# prof prof.1
# 1 2 2
# 7 2 2
# 9 3 3
# 17 2 2
# 23 2 2
# 35 2 2
# 37 NA NA
# 49 2 2
回答1:
We can first find constant columns and then use lapply to loop over them and select only their first row in each study.name.
is_constant <- function(x) length(unique(x)) == 1L
cols <- names(Filter(all, aggregate(.~study.name, DATA, is_constant)[-1]))
L[cols] <- lapply(L[cols], function(x)
x[ave(x[[1]], DATA$study.name, FUN = seq_along) == 1, ])
L
#$ESL
# ESL ESL.1
#1 1 1
#7 2 2
#9 1 1
#17 1 1
#23 1 1
#35 1 1
#37 2 2
#49 2 2
#$prof
# prof prof.1
#1 2 2
#7 2 2
#9 3 3
#17 2 2
#23 2 2
#35 2 2
#37 NA NA
#49 2 2
#.....
回答2:
We can create the expected output with aggregate
is_constant <- function(x) length(unique(x)) == 1L
nm1 <- names(which(!colSums(!aggregate(.~ study.name, DATA, is_constant)[-1])))
L[nm1] <- lapply(L[nm1], function(x) aggregate(x,
list(factor(DATA$study.name, levels = unique(DATA$study.name))),
FUN = head, 1)[-1])
L
#$ESL
# ESL ESL.1
#1 1 1
#2 2 2
#3 1 1
#4 1 1
#5 1 1
#6 1 1
#7 2 2
#8 2 2
#$prof
# prof prof.1
#1 2 2
#2 2 2
#3 3 3
#4 2 2
#5 2 2
#6 2 2
#7 NA NA
#8 2 2
#$scope
#...
来源:https://stackoverflow.com/questions/58314023/extracting-columns-with-constant-numbers-in-r-data-frames