问题
Grouping Rules:
- has at least one "1" in the same column
- shares any number of rows in common (see example)
For example:
c0 c1 c2 c3
A 1 0 0 1
B 0 0 1 0
C 0 0 0 1
D 0 1 1 0
E 0 1 0 0
Expected output:
[[A, C], [B, D, E]]
As you can see B and E do not share "1" in columns, but they have "D" in common, therefore all 3 should be grouped
回答1:
Here is a solution with networkx.
import networkx as nx
a = np.where(df.T, df.index, '').sum(axis=1)
g = [list(x) for x in a if len(x) > 1]
G = nx.Graph(g)
list(nx.connected_components(G))
[{'B', 'D', 'E'}, {'A', 'C'}]
回答2:
This can achieve what you want:
import numpy as np
from itertools import combinations
import networkx as nx
df
"""output:
1 2 3 4
0
A 1 0 0 1
B 0 0 1 0
C 0 0 0 1
D 0 1 1 0
E 0 1 0 0
"""
df.index.tolist()
"""output:
['A', 'B', 'C', 'D', 'E']
"""
list(combinations(df.index.tolist(),2))
"""output :
[('A', 'B'),
('A', 'C'),
('A', 'D'),
('A', 'E'),
('B', 'C'),
('B', 'D'),
('B', 'E'),
('C', 'D'),
('C', 'E'),
('D', 'E')]
"""
results = [x for x in list(combinations(df.index.tolist(),2)) if np.sum(df.loc[x[0],:].multiply(df.loc[x[1],:])) > 0]
results
"""output:
[('A', 'C'), ('B', 'D'), ('D', 'E')]
"""
list(nx.connected_components(nx.Graph(results)))
"""output:
[{'A', 'C'}, {'B', 'D', 'E'}]
"""
来源:https://stackoverflow.com/questions/46200969/how-to-group-all-labels-index-which-shares-at-least-one-1-in-the-same-column