I have thousands of lines of 1 to 100 numbers, every line define a group of numbers and a relationship among them. I need to get the sets of related numbers.
Little
Once you have built the data structure, exactly what queries do you want to run against it? Show us your existing code. What is a T(x)? You talk about "groups of numbers" but your sample data shows T1, T2, etc; please explain.
Have you read this: http://en.wikipedia.org/wiki/Disjoint-set_data_structure
Try looking at this Python implementation: http://code.activestate.com/recipes/215912-union-find-data-structure/
OR you can lash up something rather simple and understandable yourself, e.g.
[Update: totally new code]
class DisjointSet(object):
def __init__(self):
self.leader = {} # maps a member to the group's leader
self.group = {} # maps a group leader to the group (which is a set)
def add(self, a, b):
leadera = self.leader.get(a)
leaderb = self.leader.get(b)
if leadera is not None:
if leaderb is not None:
if leadera == leaderb: return # nothing to do
groupa = self.group[leadera]
groupb = self.group[leaderb]
if len(groupa) < len(groupb):
a, leadera, groupa, b, leaderb, groupb = b, leaderb, groupb, a, leadera, groupa
groupa |= groupb
del self.group[leaderb]
for k in groupb:
self.leader[k] = leadera
else:
self.group[leadera].add(b)
self.leader[b] = leadera
else:
if leaderb is not None:
self.group[leaderb].add(a)
self.leader[a] = leaderb
else:
self.leader[a] = self.leader[b] = a
self.group[a] = set([a, b])
data = """T1 T2
T3 T4
T5 T1
T3 T6
T7 T8
T3 T7
T9 TA
T1 T9"""
# data is chosen to demonstrate each of 5 paths in the code
from pprint import pprint as pp
ds = DisjointSet()
for line in data.splitlines():
x, y = line.split()
ds.add(x, y)
print
print x, y
pp(ds.leader)
pp(ds.group)
and here is the output from the last step:
T1 T9
{'T1': 'T1',
'T2': 'T1',
'T3': 'T3',
'T4': 'T3',
'T5': 'T1',
'T6': 'T3',
'T7': 'T3',
'T8': 'T3',
'T9': 'T1',
'TA': 'T1'}
{'T1': set(['T1', 'T2', 'T5', 'T9', 'TA']),
'T3': set(['T3', 'T4', 'T6', 'T7', 'T8'])}