Create combinations from list and remove if substring to delimiter characters is in more than 1 subelement of a list item

巧了我就是萌 提交于 2019-12-11 06:47:38

问题


I have a list that I use itertools.combinations to create all combinations. The elements in each list item are able to be delimited by the string ": ". I need to remove list items where there is more than one occurrence of the same matched substring in more than 1 element. The characters in the string up until ": " (delimiter to use for regex match???) needs to check each sub-element in a list item. Or, is there a better way?

inList = [['TEST1: sub1'],
['TEST1: sub2'],
['TEST1: sub3'],
['TESTING FOR FUN: randomtext'],
['TESTING FOR FUN: random text x2'],
['ABC123: dog']]
outputList = list(combinations(inList,3))
outputList

I get this as a result:

[(['TEST1: sub1'], ['TEST1: sub2']),
 (['TEST1: sub1'], ['TEST1: sub3']),
 (['TEST1: sub1'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub1'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub1'], ['ABC123: dog']),
 (['TEST1: sub2'], ['TEST1: sub3']),
 (['TEST1: sub2'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub2'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub2'], ['ABC123: dog']),
 (['TEST1: sub3'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub3'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub3'], ['ABC123: dog']),
 (['TESTING FOR FUN: randomtext'], ['TESTING FOR FUN: random text x2']),
 (['TESTING FOR FUN: randomtext'], ['ABC123: dog']),
 (['TESTING FOR FUN: random text x2'], ['ABC123: dog'])]

But I'd like to remove where substrings match sub-elements up until the delimiter ": ".

Desired output after sub-elements are checked for >1 occurrence in other sub-elements of a list item:

(['TEST1: sub1'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub1'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub1'], ['ABC123: dog']),
 (['TEST1: sub2'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub2'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub2'], ['ABC123: dog']),
 (['TEST1: sub3'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub3'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub3'], ['ABC123: dog']),
 (['TESTING FOR FUN: randomtext'], ['ABC123: dog']),
 (['TESTING FOR FUN: random text x2'], ['ABC123: dog'])]

*Notice the first 2 items in the list are removed in the desired output? (this applied to others where the substring prior to ": " occurs regardless of string length.


回答1:


If the desired output is correct, then you can break this down into three separate steps:

First, The delimiter is representing a key value relationship, so you can just use a dictionary to group the data with same keys before doing any other operation.

Second, take as many n length combinations of the data with different keys.

Lastly, for each of those combinations, use itertools product to get all possible pairs within the combination.

from itertools import combinations, product
from collections import defaultdict

inList = [['TEST1: sub1'],
['TEST1: sub2'],
['TEST1: sub3'],
['TESTING FOR FUN: randomtext'],
['TESTING FOR FUN: random text x2'],
['ABC123: dog']]


inDict = defaultdict(list)
for lst in inList:
    key = lst[0].partition(':')[0]
    inDict[key].append(lst)

print(inDict)
#Output:
defaultdict(list,
            {'TEST1': [['TEST1: sub1'], ['TEST1: sub2'], ['TEST1: sub3']],
             'TESTING FOR FUN': [['TESTING FOR FUN: randomtext'],
              ['TESTING FOR FUN: random text x2']],
             'ABC123': [['ABC123: dog']]})


temp = combinations(inDict.values(), 2) #2 length pairs from all dict values. change the number here as needed
result = []
for group in temp:
    result.extend(product(*group)) #calculate all products for each pair of lists. 

print(result)
#Output:
[(['TEST1: sub1'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub1'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub2'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub2'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub3'], ['TESTING FOR FUN: randomtext']),
 (['TEST1: sub3'], ['TESTING FOR FUN: random text x2']),
 (['TEST1: sub1'], ['ABC123: dog']),
 (['TEST1: sub2'], ['ABC123: dog']),
 (['TEST1: sub3'], ['ABC123: dog']),
 (['TESTING FOR FUN: randomtext'], ['ABC123: dog']),
 (['TESTING FOR FUN: random text x2'], ['ABC123: dog'])]


来源:https://stackoverflow.com/questions/56372013/create-combinations-from-list-and-remove-if-substring-to-delimiter-characters-is

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!