Dplyr applying a calculation on values in a grouping comparing each item to all _other_ items in the group

六眼飞鱼酱① 提交于 2019-12-11 07:38:50

问题


I want to work out whether a value in a grouping is different enough from other values in a grouping. Specifically I want to work out whether an end time of a matches with the start time of another lesson on the same day for the same student. Using diamonds, this is the equivalent code:

library(ggplot2)
diamonds %>% group_by(color, cut) %>% 
  mutate(clash = sum(
           lapply(
             diamonds %>% 
               filter(color == color, cut == cut, carat != carat) %$% carat,
             function(x) ifelse(x < carat - 0.01 && x > carat + 0.01, 1, 0)))) %>%
  arrange(color, cut, clash)

The plan is if clash is over 1, then I know that another diamond is very close in carat size to the diamond in that grouping. This gives me the following error:

Error in sum(sapply(diamonds %>% filter(color == color, cut == cut, carat !=  : 
  invalid 'type' (list) of argument

This makes the second call to diamond look dodgy


回答1:


you can use pmap instead lapply which fits better inside the tidyverse:

library(tidyverse)

myfun <- function(.color, .cut, .carat){
 diamonds %>%
    filter(color == .color, cut == .cut, !between(carat, .carat - 0.01, .carat + 0.01)) %>%
    nrow()
}

diamonds %>% 
  mutate(clash = pmap_int(list(color, cut, carat), myfun)) %>%
  arrange(color, cut, clash)

# A tibble: 53,940 x 11
   carat cut   color clarity depth table price     x     y     z clash
   <dbl> <ord> <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl> <int>
 1  1.01 Fair  D     SI2      64.6    56  3003  6.31  6.24  4.05   124
 2  1.01 Fair  D     SI2      64.7    57  3871  6.31  6.27  4.07   124
 3  1.01 Fair  D     SI1      66.3    55  4118  6.22  6.17  4.11   124
 4  1.01 Fair  D     SI2      65.3    55  4205  6.33  6.19  4.09   124
 5  1.01 Fair  D     SI1      65.9    60  4276  6.32  6.18  4.12   124
 6  1.01 Fair  D     SI2      64.6    62  4538  6.26  6.21  4.03   124
 7  1.01 Fair  D     SI1      63.5    58  4751  6.35  6.25  4      124
 8  1.01 Fair  D     SI1      64.6    60  4751  6.12  6.08  3.94   124
 9  1.01 Fair  D     SI1      66.9    54  4751  6.25  6.21  4.17   124
10  1.01 Fair  D     SI1      66.2    56  5122  6.05  6.1   4.02   124

Note that this solution works but is not very efficient. You can easily modify this code to operate groupwise:

diamonds2 <- diamonds %>%
  count(color, carat, cut)

myfun2 <- function(.color, .cut, .carat){
  diamonds2 %>%
    filter(color == .color, cut == .cut, !between(carat, .carat - 0.01, .carat + 0.01)) %>%
    pull(n) %>% sum
}

diamonds2 %>% 
  mutate(clash = pmap_int(list(color, cut, carat), myfun2)) %>%
  left_join(diamonds, ., by = c("color", "carat", "cut")) %>%
  arrange(color, cut, clash)

The result is the same, but the second version (using myfun2) is way faster.

EDIT

To see an example where we also use clarity to filter see this example:

diamonds3 <- diamonds %>%
  count(color, carat, cut, clarity)


myfun3 <- function(.color, .cut, .carat, .clarity){
  diamonds3 %>%
    filter(color == .color, cut == .cut, clarity == .clarity, 
           !between(carat, .carat - 0.01, .carat + 0.01)) %>%
    pull(n) %>% sum
}

 myfun3(.color = "D", .cut == "Fair", .clarity = "I1", .carat = 1.5)   
[1] 3


来源:https://stackoverflow.com/questions/58878544/dplyr-applying-a-calculation-on-values-in-a-grouping-comparing-each-item-to-all

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!