A set union find algorithm

前端 未结 5 550
我寻月下人不归
我寻月下人不归 2020-12-03 02:00

I have thousands of lines of 1 to 100 numbers, every line define a group of numbers and a relationship among them. I need to get the sets of related numbers.

Little

5条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2020-12-03 02:11

    Once you have built the data structure, exactly what queries do you want to run against it? Show us your existing code. What is a T(x)? You talk about "groups of numbers" but your sample data shows T1, T2, etc; please explain.

    Have you read this: http://en.wikipedia.org/wiki/Disjoint-set_data_structure

    Try looking at this Python implementation: http://code.activestate.com/recipes/215912-union-find-data-structure/

    OR you can lash up something rather simple and understandable yourself, e.g.

    [Update: totally new code]

    class DisjointSet(object):
    
        def __init__(self):
            self.leader = {} # maps a member to the group's leader
            self.group = {} # maps a group leader to the group (which is a set)
    
        def add(self, a, b):
            leadera = self.leader.get(a)
            leaderb = self.leader.get(b)
            if leadera is not None:
                if leaderb is not None:
                    if leadera == leaderb: return # nothing to do
                    groupa = self.group[leadera]
                    groupb = self.group[leaderb]
                    if len(groupa) < len(groupb):
                        a, leadera, groupa, b, leaderb, groupb = b, leaderb, groupb, a, leadera, groupa
                    groupa |= groupb
                    del self.group[leaderb]
                    for k in groupb:
                        self.leader[k] = leadera
                else:
                    self.group[leadera].add(b)
                    self.leader[b] = leadera
            else:
                if leaderb is not None:
                    self.group[leaderb].add(a)
                    self.leader[a] = leaderb
                else:
                    self.leader[a] = self.leader[b] = a
                    self.group[a] = set([a, b])
    
    data = """T1 T2
    T3 T4
    T5 T1
    T3 T6
    T7 T8
    T3 T7
    T9 TA
    T1 T9"""
    # data is chosen to demonstrate each of 5 paths in the code
    from pprint import pprint as pp
    ds = DisjointSet()
    for line in data.splitlines():
        x, y = line.split()
        ds.add(x, y)
        print
        print x, y
        pp(ds.leader)
        pp(ds.group)
    

    and here is the output from the last step:

    T1 T9
    {'T1': 'T1',
     'T2': 'T1',
     'T3': 'T3',
     'T4': 'T3',
     'T5': 'T1',
     'T6': 'T3',
     'T7': 'T3',
     'T8': 'T3',
     'T9': 'T1',
     'TA': 'T1'}
    {'T1': set(['T1', 'T2', 'T5', 'T9', 'TA']),
     'T3': set(['T3', 'T4', 'T6', 'T7', 'T8'])}
    

提交回复
热议问题