Summarize with mathematical conditions in dplyr

偶尔善良 提交于 2020-01-24 21:21:13

问题


Building on this question: Summarize with conditions in dplyr I would like to use dplyr to summarize a column based on a mathematical condition (not string matching as in the linked post). I need to find the maximum measurement when the ratio of measurement/time is the highest, while creating a new column ratio. I'd also like to carry through the entire row, which I'm unsure how to do with dplyr's summarize function.


Example Data Frame

print(df)

   sample     type time measurement
1       a bacteria   24     0.57561
2       a bacteria   44     1.67236
3       a bacteria   67     4.17100
4       a bacteria   88    11.51661
5       b bacteria   24     0.53269
6       b bacteria   44     1.24942
7       b bacteria   67     5.72147
8       b bacteria   88    11.04017
9       c bacteria    0     0.00000
10      c bacteria   24     0.47418
11      c bacteria   39     1.06286
12      c bacteria   64     3.59649
13      c bacteria   78     7.05190
14      c bacteria  108     7.27060

Desired Output

  sample     type time measurement      ratio
1      a bacteria   88    11.51661 0.13087057
2      b bacteria   88    11.04017 0.12545648
3      c bacteria   78     7.05190 0.09040897

Failed Attempt

This only returns the two columns as defined by the group_by and summarize function, would like to have the entire row information carry through:

library(dplyr)
df %>% 
    group_by(sample) %>%
    summarize(ratio = max(measurement/time, na.rm = TRUE))

  sample  ratio
  <fct>   <dbl>
1 a      0.131 
2 b      0.125 
3 c      0.0904

Reproducible Data

structure(list(sample = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), 
    type = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L), .Label = "bacteria", class = "factor"), 
    time = c(24, 44, 67, 88, 24, 44, 67, 88, 0, 24, 39, 64, 78, 
    108), measurement = c(0.57561, 1.67236, 4.171, 11.51661, 
    0.53269, 1.24942, 5.72147, 11.04017, 0, 0.47418, 1.06286, 
    3.59649, 7.0519, 7.2706)), class = "data.frame", row.names = c(NA, 
-14L))

回答1:


df %>%
  mutate(ratio = measurement/time) %>%
  group_by(sample) %>%
  filter(ratio == max(ratio, na.rm=TRUE))



回答2:


This should do the trick.

df %>%
   group_by(sample) %>%
   mutate(ratio = measurement/time) %>%
   filter(ratio == max(ratio)) 



回答3:


An option would be to filter 'measurement' based on the max position of measurement/time and use that to compare (==) with the 'measurement' values after grouping by 'sample'

library(dplyr)
df %>%
   group_by(sample) %>% 
   mutate(ratio = measurement/time) %>%
   filter(measurement == measurement[which.max(ratio)])


来源:https://stackoverflow.com/questions/59199273/summarize-with-mathematical-conditions-in-dplyr

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!