Create combinations from list and remove if substring to delimiter characters is in more than 1 subelement of a list item

问题

I have a list that I use itertools.combinations to create all combinations. The elements in each list item are able to be delimited by the string ": ". I need to remove list items where there is more than one occurrence of the same matched substring in more than 1 element. The characters in the string up until ": " (delimiter to use for regex match???) needs to check each sub-element in a list item. Or, is there a better way?

inList = [['TEST1: sub1'],
['TEST1: sub2'],
['TEST1: sub3'],
['TESTING FOR FUN: randomtext'],
['TESTING FOR FUN: random text x2'],
['ABC123: dog']]
outputList = list(combinations(inList,3))
outputList

I get this as a result:

[(['TEST1: sub1'], ['TEST1: sub2']),
 (['TEST1: sub1'], ['TEST1: sub3']),
 (['TEST1: sub1'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub1'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub1'], ['ABC123: dog']),
 (['TEST1: sub2'], ['TEST1: sub3']),
 (['TEST1: sub2'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub2'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub2'], ['ABC123: dog']),
 (['TEST1: sub3'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub3'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub3'], ['ABC123: dog']),
 (['TESTING FOR FUN: randomtext'], ['TESTING FOR FUN: random text x2']),
 (['TESTING FOR FUN: randomtext'], ['ABC123: dog']),
 (['TESTING FOR FUN: random text x2'], ['ABC123: dog'])]

But I'd like to remove where substrings match sub-elements up until the delimiter ": ".

Desired output after sub-elements are checked for >1 occurrence in other sub-elements of a list item:

(['TEST1: sub1'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub1'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub1'], ['ABC123: dog']),
 (['TEST1: sub2'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub2'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub2'], ['ABC123: dog']),
 (['TEST1: sub3'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub3'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub3'], ['ABC123: dog']),
 (['TESTING FOR FUN: randomtext'], ['ABC123: dog']),
 (['TESTING FOR FUN: random text x2'], ['ABC123: dog'])]

*Notice the first 2 items in the list are removed in the desired output? (this applied to others where the substring prior to ": " occurs regardless of string length.

回答1:

If the desired output is correct, then you can break this down into three separate steps:

First, The delimiter is representing a key value relationship, so you can just use a dictionary to group the data with same keys before doing any other operation.

Second, take as many n length combinations of the data with different keys.

Lastly, for each of those combinations, use itertools product to get all possible pairs within the combination.

from itertools import combinations, product
from collections import defaultdict

inList = [['TEST1: sub1'],
['TEST1: sub2'],
['TEST1: sub3'],
['TESTING FOR FUN: randomtext'],
['TESTING FOR FUN: random text x2'],
['ABC123: dog']]


inDict = defaultdict(list)
for lst in inList:
    key = lst[0].partition(':')[0]
    inDict[key].append(lst)

print(inDict)
#Output:
defaultdict(list,
            {'TEST1': [['TEST1: sub1'], ['TEST1: sub2'], ['TEST1: sub3']],
             'TESTING FOR FUN': [['TESTING FOR FUN: randomtext'],
              ['TESTING FOR FUN: random text x2']],
             'ABC123': [['ABC123: dog']]})


temp = combinations(inDict.values(), 2) #2 length pairs from all dict values. change the number here as needed
result = []
for group in temp:
    result.extend(product(*group)) #calculate all products for each pair of lists. 

print(result)
#Output:
[(['TEST1: sub1'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub1'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub2'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub2'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub3'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub3'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub1'], ['ABC123: dog']),
 (['TEST1: sub2'], ['ABC123: dog']),
 (['TEST1: sub3'], ['ABC123: dog']),
 (['TESTING FOR FUN: randomtext'], ['ABC123: dog']),
 (['TESTING FOR FUN: random text x2'], ['ABC123: dog'])]

来源：https://stackoverflow.com/questions/56372013/create-combinations-from-list-and-remove-if-substring-to-delimiter-characters-is

标签

python

list

itertools