Group Python lists based on repeated items

故事扮演 提交于 2021-02-11 15:37:46

问题


This question is very similar to this one Group Python list of lists into groups based on overlapping items, in fact it could be called a duplicate.

Basically, I have a list of sub-lists where each sub-list contains some number of integers (this number is not the same among sub-lists). I need to group all sub-lists that share one integer or more.

The reason I'm asking a new separate question is because I'm attempting to adapt Martijn Pieters' great answer with no luck.

Here's the MWE:

def grouper(sequence):
    result = []  # will hold (members, group) tuples

    for item in sequence:
        for members, group in result:
            if members.intersection(item):  # overlap
                members.update(item)
                group.append(item)
                break
        else:  # no group found, add new
            result.append((set(item), [item]))

    return [group for members, group in result]


gr = [[29, 27, 26, 28], [31, 11, 10, 3, 30], [71, 51, 52, 69],
      [78, 67, 68, 39, 75], [86, 84, 81, 82, 83, 85], [84, 67, 78, 77, 81],
      [86, 68, 67, 84]]

for i, group in enumerate(grouper(gr)):
    print 'g{}:'.format(i), group

and the output I get is:

g0: [[29, 27, 26, 28]]
g1: [[31, 11, 10, 3, 30]]
g2: [[71, 51, 52, 69]]
g3: [[78, 67, 68, 39, 75], [84, 67, 78, 77, 81], [86, 68, 67, 84]]
g4: [[86, 84, 81, 82, 83, 85]]

The last group g4 should have been merged with g3, since the lists inside them share the items 81, 83 and 84, and even a single repeated element should be enough for them to be merged.

I'm not sure if I'm applying the code wrong, or if there's something wrong with the code.


回答1:


You can describe the merge you want to do as a set consolidation or as a connected-components problem. I tend to use an off-the-shelf set consolidation algorithm and then adapt it to the particular situation. For example, IIUC, you could use something like

def consolidate(sets):
    # http://rosettacode.org/wiki/Set_consolidation#Python:_Iterative
    setlist = [s for s in sets if s]
    for i, s1 in enumerate(setlist):
        if s1:
            for s2 in setlist[i+1:]:
                intersection = s1.intersection(s2)
                if intersection:
                    s2.update(s1)
                    s1.clear()
                    s1 = s2
    return [s for s in setlist if s]

def wrapper(seqs):
    consolidated = consolidate(map(set, seqs))
    groupmap = {x: i for i,seq in enumerate(consolidated) for x in seq}
    output = {}
    for seq in seqs:
        target = output.setdefault(groupmap[seq[0]], [])
        target.append(seq)
    return list(output.values())

which gives

>>> for i, group in enumerate(wrapper(gr)):
...     print('g{}:'.format(i), group)
...     
g0: [[29, 27, 26, 28]]
g1: [[31, 11, 10, 3, 30]]
g2: [[71, 51, 52, 69]]
g3: [[78, 67, 68, 39, 75], [86, 84, 81, 82, 83, 85], [84, 67, 78, 77, 81], [86, 68, 67, 84]]

(Order not guaranteed because of the use of the dictionaries.)




回答2:


Sounds like set consolidation if you turn each sub list into a set instead as you are interested in the contents not the order so sets are the best data-structure choice. See this: http://rosettacode.org/wiki/Set_consolidation



来源:https://stackoverflow.com/questions/32057777/group-python-lists-based-on-repeated-items

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!