问题
I am stuck on a problem that seem trivial but I am unable to figure it out right now. I don't even know how to formulate it properly, if you have any suggestions, you are welcome. I have a data.frame which I want to group/index depending on two columns. The thing is, the rows I want to group do not share the same values in those columns. Rather, some rows have the same value in one column, and then some of those rows have a common value with different rows in the second column (which I also want to include in the grouping). Here is a minimal example, I hope this makes it a bit clearer:
id V1 V2 group_id
1 a c 1
2 a d 1
3 b d 1
4 w y 2
5 w z 2
6 x z 2
Rows 1 and 2 have the value a
of column V1 in common. But I not only want to group them, but also row 3, which is "connected" via the value d
of column V2. Right now, I am only able to group rows 1,2 and 2,3 separately.
The same is true for the 2nd group, here I want to group values with either w
in V1 or z
in V2. x
and y
are irrelevant.
Any help is highly appreciated.
回答1:
Here's how you could do that with the cluster
function from the igraph
package:
library(igraph)
relations <- data.frame(from=df$V1,to=df$V2)
g <- graph_from_data_frame(relations)
group_id <- data.frame(V=names(clusters(g)$membership),
cluster=clusters(g)$membership,stringsAsFactors=FALSE)
left_join(df,group_id,by=c("V1"="V"))
id V1 V2 group_id cluster
1 1 a c 1 1
2 2 a d 1 1
3 3 b d 1 1
4 4 w y 2 2
5 5 w z 2 2
6 6 x z 2 2
来源:https://stackoverflow.com/questions/43482086/group-by-two-columns-and-union-of-levels-in-r