Extracting columns with constant numbers in R data.frames

情到浓时终转凉″ 提交于 2020-02-01 08:51:25

问题


In data.frame DATA, I have some columns that are constant numbers across the unique rows of the first column called study.name. For example, columns ESL and prof are constant for all rows of Shin.Ellis and constant for all rows of Trus.Hsu and so on. Including Shin.Ellis and Trus.Hsu, there are 8 unique study.name rows.

BUT after my split.default() call below, how can I obtain only one data-point for all rows under a unique study.name (e.g., one for Shin.Ellis, one for Trus.Hsu etc.) for such constant variables? (i.e., 8 rows overall)

For example, after my split.default(), all variables named ESL show only have 8 rows each for a unique study.name.

My desired output for ONLY ESL and prof is shown further below.

NOTE: This is toy data. We first should find constant variables. A functional answer is highly appreciated.

DATA <- read.csv("https://raw.githubusercontent.com/izeh/m/master/irr.csv", h = T)[-(2:3)]
DATA <- setNames(DATA, sub("\\.\\d+$", "", names(DATA)))

tbl <- table(names(DATA))
nm2 <- names(which(tbl==max(tbl)))

L <- split.default(DATA[names(DATA) %in% nm2], names(DATA)[names(DATA) %in% nm2])


## FIRST 8 ROWS of `DATA`:

#    study.name ESL prof scope type   ESL   prof   scope   type
# 1  Shin.Ellis   1    2     1    1     1      2       1      1
# 2  Shin.Ellis   1    2     1    1     1      2       1      1
# 3  Shin.Ellis   1    2     1    2     1      2       1      1
# 4  Shin.Ellis   1    2     1    2     1      2       1      1
# 5  Shin.Ellis   1    2    NA   NA     1      2      NA     NA
# 6  Shin.Ellis   1    2    NA   NA     1      2      NA     NA
# 7    Trus.Hsu   2    2     2    1     2      2       1      1
# 8    Trus.Hsu   2    2    NA   NA     2      2      NA     NA
# .     ...       .    .     .    .     .      .       .      . # `DATA` has 54 rows overall

Desired output for ESL and prof after split.default() call:

# $ESL            ## 8 unique rows for 8 unique `study.name`
#    ESL ESL.1
# 1    1     1
# 7    2     2
# 9    1     1
# 17   1     1
# 23   1     1
# 35   1     1
# 37   2     2
# 49   2     2


# $prof           ## 8 unique rows for 8 unique `study.name`
#    prof prof.1
# 1     2      2
# 7     2      2
# 9     3      3
# 17    2      2
# 23    2      2
# 35    2      2
# 37   NA     NA
# 49    2      2

回答1:


We can first find constant columns and then use lapply to loop over them and select only their first row in each study.name.

is_constant <- function(x) length(unique(x)) == 1L 
cols <- names(Filter(all, aggregate(.~study.name, DATA, is_constant)[-1]))

L[cols] <- lapply(L[cols], function(x) 
                      x[ave(x[[1]], DATA$study.name, FUN = seq_along) == 1, ])
L

#$ESL
#   ESL ESL.1
#1    1     1
#7    2     2
#9    1     1
#17   1     1
#23   1     1
#35   1     1
#37   2     2
#49   2     2

#$prof
#   prof prof.1
#1     2      2
#7     2      2
#9     3      3
#17    2      2
#23    2      2
#35    2      2
#37   NA     NA
#49    2      2
#.....



回答2:


We can create the expected output with aggregate

is_constant <- function(x) length(unique(x)) == 1L 
nm1 <-  names(which(!colSums(!aggregate(.~ study.name, DATA, is_constant)[-1])))
L[nm1] <- lapply(L[nm1], function(x) aggregate(x, 
   list(factor(DATA$study.name, levels = unique(DATA$study.name))), 
          FUN = head, 1)[-1])
L
#$ESL
#  ESL ESL.1
#1   1     1
#2   2     2
#3   1     1
#4   1     1
#5   1     1
#6   1     1
#7   2     2
#8   2     2

#$prof
#  prof prof.1
#1    2      2
#2    2      2
#3    3      3
#4    2      2
#5    2      2
#6    2      2
#7   NA     NA
#8    2      2

#$scope
#...


来源:https://stackoverflow.com/questions/58314023/extracting-columns-with-constant-numbers-in-r-data-frames

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!