Applying functions to nested dataframes with map

旧时模样 提交于 2019-12-25 08:13:08

问题


I am having an issue with nesting and mapping that I am not sure how to get around. I have a tibble with nested dataframes, as follows:

> x
# A tibble: 18 × 3
   event.no               data dr.dur
      <dbl>             <list>  <int>
1         1   <tibble [7 × 4]>      7
2         4 <tibble [123 × 4]>    123
3         5   <tibble [9 × 4]>      9
4         7  <tibble [14 × 4]>     14
5        10  <tibble [19 × 4]>     19
6        11 <tibble [220 × 4]>    220
7        12 <tibble [253 × 4]>    253
8        14 <tibble [153 × 4]>    153
9        15  <tibble [28 × 4]>     28
10       17 <tibble [169 × 4]>    169
11       18   <tibble [7 × 4]>      7
12       19 <tibble [115 × 4]>    115
13       21 <tibble [109 × 4]>    109
14       25  <tibble [13 × 4]>     13
15       26 <tibble [249 × 4]>    249
16       28   <tibble [7 × 4]>      7
17       30  <tibble [26 × 4]>     26
18       31  <tibble [12 × 4]>     12
>
> x$data[[1]]
# A tibble: 7 × 4
  discharge threshold def.increase event.orig
      <dbl>     <dbl>        <dbl>      <dbl>
1     0.348     0.373       2160.0          1
2     0.348     0.373       2160.0          1
3     0.379     0.373       -518.4          0
4     0.379     0.373       -518.4          0
5     0.379     0.373       -518.4          0
6     0.379     0.373       -518.4          0
7     0.348     0.373       2160.0          2
> 

I need to find the sum of the def.increase column in each of the nested dataframes. I'm not sure of the best method to do this right now, this is what I've been trying:

> x %>%
+   mutate(dr.def = map(data, colSums)) %>%
+   unnest(dr.def)
# A tibble: 72 × 3
   event.no dr.dur    dr.def
      <dbl>  <int>     <dbl>
1         1      7     2.560
2         1      7     2.611
3         1      7  4406.400
4         1      7     4.000
5         4    123    45.739
6         4    123    45.879
7         4    123 12096.000
8         4    123   530.000
9         5      9     3.269
10        5      9     3.357
# ... with 62 more rows

Obviously the issue with this is that I end up with the sum from every column. This would be okay but it gets quite messy afterwards to select only the rows that I want. Is there a better way of finding the column sum for each of my def.increase columns? Thanks for your help :)

Edit: Not sure if I can copy/paste an object like my x so here is a link to the rds on wetransfer (if that's allowed): https://wetransfer.com/downloads/9697fff593f51c02136bc704adccbcc220170112161115/5be1fc


回答1:


You just need to select the def.increase column first:

library(tidyverse)

x %>% 
  mutate(dr.def = map(data, "def.increase") %>% map_dbl(sum))

Or just with one map:

x %>% 
  mutate(dr.def = map_dbl(data, ~ sum(.x[["def.increase"]])))


来源:https://stackoverflow.com/questions/41617917/applying-functions-to-nested-dataframes-with-map

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!