purrr: joining tibbles nested in different list columns

问题

This is essentially a follow-up question to a previous one which @keqiang-li kindly answered.

I have a dataframe which includes a list column (nested data-frames) comprising parties to a government and the number of their respective seats. This dataframe is split per country (note I used the new dplyr 0.8 group_nest and group_split).

What I am essentially trying is to get another list column which features for each goverment a list for each previous government which holds a dataframe indicating the overlap of parties and seats.

library(tidyverse)


df <- tibble::tribble(
  ~period, ~party, ~seats,
  1,    "A",      2,
  1,    "B",      3,
  1,    "C",      3,
  2,    "A",      2,
  2,    "C",      3,
  3,    "C",      4,
  3,    "E",      1,
  3,    "F",      3
)

df <- bind_rows(AA=df, BB=df, .id="country")

df <- df %>% 
  group_by(country, period) %>% 
  group_nest() %>% 
  #mutate(gov=map(data, "party") %>% map(.,list)) %>% 
  mutate(prev.govs=map(data, "party") %>% 
           map(., list) %>%
           accumulate(.,union))

df <- df %>% 
  group_split(country) %>% 
  map(., ~mutate(., prev.govs.df=map_depth(prev.govs, 2, enframe, value="party")))

df is my point of departure. Below unsucessful attempts.

##attempts
df %>% 
  map(., ~mutate(., df.overlap=map_depth(prev.govs.df, 3, ~map2(., data, inner_join))))
#> Error in UseMethod("inner_join"): nicht anwendbare Methode für 'inner_join' auf Objekt der Klasse "c('integer', 'numeric')" angewendet

df %>% 
  map(., ~mutate(., df.overlap=map_depth(prev.govs.df, 2, ~map2(., data, inner_join))))
#> Error: Mapped vectors must have consistent lengths:
#> * `.x` has length 2
#> * `.y` has length 3

df %>% 
  map(., ~mutate(., df.overlap=map2(data, prev.govs.df, ~map2(.x, .y, ~map2(.x, .y, inner_join)))))
#> Error: Mapped vectors must have consistent lengths:
#> * `.x` has length 3
#> * `.y` has length 2

On a more specific level, the solution for country AA in period 3 would be the 3 lists each with a tibble containing the rows from data which overlapped with those in prev.govs.def on the party column (key)

df[[1]][["prev.govs.df"]][[3]] 
#> [[1]]
#> # A tibble: 3 x 2
#>    name party
#>   <int> <chr>
#> 1     1 A    
#> 2     2 B    
#> 3     3 C    
#> 
#> [[2]]
#> # A tibble: 2 x 2
#>    name party
#>   <int> <chr>
#> 1     1 A    
#> 2     2 C    
#> 
#> [[3]]
#> # A tibble: 3 x 2
#>    name party
#>   <int> <chr>
#> 1     1 C    
#> 2     2 E    
#> 3     3 F
df[[1]][["data"]][[3]]
#> # A tibble: 3 x 2
#>   party seats
#>   <chr> <dbl>
#> 1 C         4
#> 2 E         1
#> 3 F         3

The answer to the previouus question solved the riddle how to intersect two lists. Unfortuantley, I couldn't figure out how to make the next step entailing splitting the dataframe and merging nested tibbles.

Greatful for any hint!

回答1:

OP's third attempt is actually really close. We just need to modify the last map like the following:

library(tidyverse)

output <- df %>%
  map(~mutate(., df.overlap = map2(data, prev.govs.df, ~map(.y, inner_join, .x))))

Output:

[[1]]
# A tibble: 3 x 6
  country period data             prev.govs  prev.govs.df df.overlap
  <chr>    <dbl> <list>           <list>     <list>       <list>    
1 AA           1 <tibble [3 x 2]> <list [1]> <list [1]>   <list [1]>
2 AA           2 <tibble [2 x 2]> <list [2]> <list [2]>   <list [2]>
3 AA           3 <tibble [3 x 2]> <list [3]> <list [3]>   <list [3]>

[[2]]
# A tibble: 3 x 6
  country period data             prev.govs  prev.govs.df df.overlap
  <chr>    <dbl> <list>           <list>     <list>       <list>    
1 BB           1 <tibble [3 x 2]> <list [3]> <list [3]>   <list [3]>
2 BB           2 <tibble [2 x 2]> <list [3]> <list [3]>   <list [3]>
3 BB           3 <tibble [3 x 2]> <list [3]> <list [3]>   <list [3]>

> output[[1]]$df.overlap[[3]]
[[1]]
# A tibble: 1 x 3
   name party seats
  <int> <chr> <dbl>
1     3 C         4

[[2]]
# A tibble: 1 x 3
   name party seats
  <int> <chr> <dbl>
1     2 C         4

[[3]]
# A tibble: 3 x 3
   name party seats
  <int> <chr> <dbl>
1     1 C         4
2     2 E         1
3     3 F         3

回答2:

One reason would be the issue with the difference in length of list elements. We could replicate one of the list elements to make the lengths same and then do the inner_join

out <- df %>%
         map(., ~ .x %>% 
             mutate(df.overlap = map2(prev.govs.df, data, ~ 
                map2(rep(list(.y), length(.x)), .x, inner_join))))

-output

out[[1]]
# A tibble: 3 x 6
#  country period data             prev.govs  prev.govs.df df.overlap
#  <chr>    <dbl> <list>           <list>     <list>       <list>    
#1 AA           1 <tibble [3 × 2]> <list [1]> <list [1]>   <list [1]>
#2 AA           2 <tibble [2 × 2]> <list [2]> <list [2]>   <list [2]>
#3 AA           3 <tibble [3 × 2]> <list [3]> <list [3]>   <list [3]>

# overlap column element
out[[1]]$df.overlap[[3]][[1]]
# A tibble: 1 x 3
#  party seats  name
#  <chr> <dbl> <int>
#1 C         4     3


# input dataset elements used for joining
out[[1]]$data[[3]]
# A tibble: 3 x 2
#  party seats
#  <chr> <dbl>
#1 C         4
#2 E         1
#3 F         3

out[[1]]$prev.govs.df[[3]][[1]]
# A tibble: 3 x 2
#   name party
#  <int> <chr>
#1     1 A    
#2     2 B    
#3     3 C

来源：https://stackoverflow.com/questions/54752196/purrr-joining-tibbles-nested-in-different-list-columns

标签

purrr