Using dplyr summarise with conditions

问题

I am currently trying to apply the summarise function in order to isolate the relevant observations from a large data set. A simple reproducible example is given here:

df <- data.frame(c(1,1,1,2,2,2,3,3,3), as.logical(c(TRUE,FALSE,TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,FALSE)),
                 as.numeric(c(0,5,0,0,0,0,7,0,7)))
colnames(df) <- c("ID", "Status", "Price")

  ID Status Price
1  1   TRUE     0
2  1  FALSE     5
3  1   TRUE     0
4  2   TRUE     0
5  2   TRUE     0
6  2   TRUE     0
7  3  FALSE     7
8  3   TRUE     0
9  3  FALSE     7

I would like to sort the table by observation and get the status TRUE only if all three observations are TRUE (figured out) and then want to get the price corresponding to the status (i.e. 5 for observation 1 as FALSE, 0 for observation 2 as TRUE and 7 for observation 3 as FALSE).

From Summarize with conditions in dplyr I have figured out that I can - just as usually - specify the conditions in square brackets. My code so far thus looks like this:

library(dplyr)
result <- df %>%
  group_by(ID) %>%
  summarize(Status = all(Status), Test = ifelse(all(Status) == TRUE,
 first(Price[Status == TRUE]), first(Price[Status == FALSE]))) 

# This is what I get: 
# A tibble: 3 x 3
     ID Status  Test
  <dbl> <lgl>  <dbl>
1    1. FALSE     0.
2    2. TRUE      0.
3    3. FALSE     7.

But as you can see, for ID = 1 it gives an incorrect price. I have been trying this forever, so I would appreciate any hint as to where I have been going wrong.

回答1:

We could keep the all(Status) as second argument in summarise (or change the column name) and also, it can be done with if/else as the logic seems to return a single TRUE/FALSE based on whether all of the 'Status' is TRUE or not

df %>%
   group_by(ID) %>% 
   summarise( Test = if(all(Status)) first(Price[Status]) else 
                   first(Price[!Status]), Status = all(Status))
# A tibble: 3 x 3
#     ID  Test Status
#   <dbl> <dbl> <lgl> 
#1     1     5 FALSE 
#2     2     0 TRUE  
#3     3     7 FALSE

NOTE: It is better not to use ifelse with unequal lengths for its arguments

回答2:

Could do:

df %>%
  group_by(ID) %>%
  mutate(status = Status) %>%
  summarise(
    Status = all(Status),
    Test = ifelse(Status == TRUE,
                  first(Price),
                  first(Price[status == FALSE]))
  )

Output:

# A tibble: 3 x 3
     ID Status  Test
  <dbl> <lgl>  <dbl>
1     1 FALSE      5
2     2 TRUE       0
3     3 FALSE      7

The issue is that you want to use Status for Test column while you've already modified it so that it doesn't contain original values anymore.

Make a copy before (I've saved it in status), execute ifelse on it and it'll run fine.

来源：https://stackoverflow.com/questions/54824804/using-dplyr-summarise-with-conditions

标签

dplyr

summarize