Duplicated rows: select rows based on criteria and store duplicated values

前端 未结 2 1457
有刺的猬
有刺的猬 2021-01-23 19:17

I am working on a raw dataset that looks something like this:

df <- data.frame(\"ID\" = c(\"Alpha\", \"Alpha\", \"Alpha\", \"Alpha\", 
                                


        
2条回答
  •  误落风尘
    2021-01-23 19:32

    Here is one option with dplyr. After grouping by 'ID', 'Year', create a logical column ('ind') that checks the max of 'Val2', using that create two columns corresponding to 'Val' with 'del' as prefix for those values that are eliminated, as well as the 'treatment' not present, filter the rows based on 'ind' and ungroup

    library(dplyr)
    df %>% 
       group_by(ID, Year) %>% 
       mutate(ind = Val2 == max(Val2) & !is.na(Val2)) %>% 
       mutate_at(vars(matches('Val')), 
            list(del = ~ if(any(!ind)) .[!ind] else NA_real_)) %>% 
       mutate(del_treat = if(any(!ind)) treatment[!ind] else NA_character_) %>% 
       filter(ind) %>%
       ungroup %>%
       select(-ind)
    

提交回复
热议问题