How to merge two different groupings if they are not disjoint with dplyr

青春壹個敷衍的年華 提交于 2021-02-04 06:28:47

问题


Suppose that I have two sets of identifiers id1 and id2 in a data frame. How can I create a new identifier id3 that works as follows:

I consider id1 as the stricter key, so that observations are first grouped in id1 and then in id2. If there are two sets of rows with different values of id2 that have some of its elements with the same id1, these two sets should have the same value for id3 (the exact value in id3 doesn't matter much).

 df <- data.frame(id1 = c(1, 1, 2, 2, 5, 6),
             id2 = c(4, 3, 1, 2, 2, 7),
             id3 = c(1, 1, 2, 2, 2, 3))

Rows 1 and 2 are grouped together because they have the same id1. Rows 3, 4 and 5 are grouped together because 3 and 4 have the same id1 and 4 and 5 have the same id2.

Can someone help? I would rather have a solution with dplyr that encompasses a general case in which there is an arbitrary number of possible values in the id columns.


回答1:


This is a graph theory problem. Each id1 and id2 is a separate node and df gives the links between them. You are looking to see which weakly connected clusters each id belongs too.

library(igraph)
df <- df %>% mutate(from = paste0('id1', '_', id1), to = paste0('id2', '_', id2))
dg <- graph_from_data_frame(df %>% select(from, to), directed = FALSE)
df <- df %>% mutate(id3 = components(dg)$membership[from])
df %>% select(id1, id2, id3)

#>   id1 id2 id3
#> 1   1   4   1
#> 2   1   3   1
#> 3   2   1   2
#> 4   2   2   2
#> 5   5   2   2
#> 6   6   7   3


来源:https://stackoverflow.com/questions/63908856/how-to-merge-two-different-groupings-if-they-are-not-disjoint-with-dplyr

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!