Subset data frame with matrix of logical values

99封情书 提交于 2019-12-10 15:41:19

问题


Problem

I have data on two measures for four individuals each in a wide format. The measures are x and y and the individuals are A, B, C, D. The data frame looks like this

d <- data.frame(matrix(sample(1:100, 40, replace = F), ncol = 8))
colnames(d) <- paste(rep(c("x.", "y."),each = 4), rep(LETTERS[1:4], 2), sep ="")
d

  x.A x.B x.C x.D y.A y.B y.C y.D
1  56  65  42  96 100  76  39  26
2  19  93  94  75  63  78   5  44
3  22  57  15  62   2  29  89  79
4  49  13  95  97  85  81  60  37
5  45  38  24  91  23  82  83  72

Now, would I would like to obtain for each row is the value of y for the individual with the lowest value of x.

So in the example above, the lowest value of x in row 1 is for individual C. Hence, for row 1 I would like to obtain y.C which is 39.

In the example, the resulting vector should be 39, 63, 89, 81, 83.

Approach

I have tried to get to this by first generating a matrix of the subset of d for the values of x.

t(apply(d[,1:4], 1, function(x) min(x) == x))

       x.A   x.B   x.C   x.D
[1,] FALSE FALSE  TRUE FALSE
[2,]  TRUE FALSE FALSE FALSE
[3,] FALSE FALSE  TRUE FALSE
[4,] FALSE  TRUE FALSE FALSE
[5,] FALSE FALSE  TRUE FALSE

Now I wanted to apply this matrix to subset the subset of the data frame for the values of y. But I cannot find a way to achieve this.

Any help is much appreciated. Suggestions for a totally different - more elegant - approach are highly welcome too.

Thanks a lot!


回答1:


We subset the dataset with the columns starting with 'x' ('dx') and 'y' ('dy'). Get the column index of the minimum value in each row of 'dx' using max.col, cbind with the row index and get the corresponding elements in 'dy'.

 dx <- d[grep('^x', names(d))]
 dy <- d[grep('^y', names(d))]
 dy[cbind(1:nrow(dx),max.col(-dx, 'first'))]
 #[1] 39 63 89 81 83

The above can be easily be converted to a function

 get_min <- function(dat){
     dx <- dat[grep('^x', names(dat))]
     dy <- dat[grep('^y', names(dat))]
     dy[cbind(1:nrow(dx), max.col(-dx, 'first'))]
   }
get_min(d)
#[1] 39 63 89 81 83

Or using the OP's apply based method

t(d[,5:8])[apply(d[,1:4], 1, function(x) min(x) == x)] 
#[1] 39 63 89 81 83

data

d <- structure(list(x.A = c(56L, 19L, 22L, 49L, 45L),
x.B = c(65L, 
93L, 57L, 13L, 38L), x.C = c(42L, 94L, 15L, 95L, 24L), 
x.D = c(96L, 
75L, 62L, 97L, 91L), y.A = c(100L, 63L, 2L, 85L, 23L), 
y.B = c(76L, 
78L, 29L, 81L, 82L), y.C = c(39L, 5L, 89L, 60L, 83L), 
y.D = c(26L, 
44L, 79L, 37L, 72L)), .Names = c("x.A", "x.B", "x.C", 
"x.D", 
"y.A", "y.B", "y.C", "y.D"), class = "data.frame", 
row.names = c("1", "2", "3", "4", "5"))



回答2:


Here is my solution. The core idea is that there are functions which.min, which.max that can be row applied to the data frame:

Edit:

Now, would I would like to obtain for each row is the value of y for the individual with the lowest value of x.

ind <- apply(d[ ,1:4], 1, which.min) # build column index by row
res <- d[,5:8][cbind(1:nrow(d), ind)] # rows are in order, select values by matrix
names(res) <- colnames(d)[5:8][ind] # set colnames as names from the sample column
res 
y.D y.B y.D y.A y.D
18  46  16  85  80

Caveat: only works if individuals are in the same order for treatment x. and y. and all individuals present. Otherwise you can use grep like in Akrun's solution.

# My d was:

   x.A x.B x.C x.D y.A y.B y.C y.D
1  88  96  65  55  14  99  63  18
2  12  11  27  45  70  46  20  69
3  32  81  21   9  77  44  91  16
4   8  84  42  78  85  94  28  90
5  31  51  83   2  67  25  54  80



回答3:


We can create a function as follows,

get_min <- function(x){
  d1 <- x[,1:4]
  d2 <- x[,5:8]
  mtrx <- as.matrix(d2[,apply(d1, 1, which.min)])
  a <- row(mtrx) - col(mtrx)
  split(mtrx, a)$"0" 
}
get_min(d)
#[1] 39 63 89 81 83


来源:https://stackoverflow.com/questions/35887997/subset-data-frame-with-matrix-of-logical-values

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!