Python3 find how many differences are in 2 lists in order to be equal

倾然丶 夕夏残阳落幕 提交于 2019-12-10 09:40:55

问题


Assuming we got 2 lists, always with the same length and always containing strings.

list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
list2 = ['gg', 'gg', 'gg', 'gg', 'gg', 'sot']

we need to find:

How many items of the list2 should change, in order for it to be equals with list1.

So on the previous example it should return 2

For this example:

list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
list2 = ['gg', 'gg', 'gg', 'gg', 'sot', 'sot']

it should return 1

and finally for this example:

list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
list2 = ['ts', 'ts', 'ts', 'ts', 'ts', 'ts']

it should return 5.

We do not care about which elements should change to what. We neither care about the order, so that means that

['gg', 'gg', 'gg', 'gg', 'gg', 'sot'] 
and
['gg', 'gg', 'sot', 'gg', 'gg', 'gg']

are equal and the result of them should be 0.

The length of the lists could be 6, 8, 20 or whatever and sometimes there are more elements in place.

I tried a lot of things like set(list1) - set(list2) ,list(set(list1).difference(list2)) , set(list1).symmetric_difference(set(list2)) but without any success.


回答1:


You could leverage the many possibilities Counter offers:

list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
list2 = ['gg', 'gg', 'gg', 'gg', 'gg', 'sot']

from collections import Counter

sum((Counter(list1) - Counter(list2)).values())
# 2

Lets check with the other examples:

list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
list2 = ['gg', 'gg', 'gg', 'gg', 'sot', 'sot']

sum((Counter(list1) - Counter(list2)).values())
# 1

list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
list2 = ['ts', 'ts', 'ts', 'ts', 'ts', 'ts']

sum((Counter(list1) - Counter(list2)).values())
# 5

list1 = ['gg', 'gg', 'gg', 'gg', 'gg', 'sot'] 
list2 = ['gg', 'gg', 'sot', 'gg', 'gg', 'gg']

sum((Counter(list1) - Counter(list2)).values())
# 0

Details

By using Counter, you will have a count of all elements from each list in the form of a dictionary. Lets go back to the first example:

c1 = Counter(list1)
# Counter({'sot': 2, 'ts': 1, 'gg': 3})

c2 = Counter(list2)
# Counter({'gg': 5, 'sot': 1})

Now we somehow would like to get an understanding of:

  • Which items are present in list1 but not in list2

  • Out of those that are present and also those there are not, how many more are needed in list2 so that they contain the same amount of counts

Well we could take advantage of the fact that counters support mathematical operations, the result of which produces multisets, i.e counters that have counts greater than zero. So given that we're looking for the difference between both counters it seems like we could subtract them and see what elements and their respective counts are needed in list2.

So how would subtraction between Counters work? Lets check with a simple example:

Counter({1:4, 2: 1}) - Counter({1:1, 3:1})  
# Counter({1: 3, 2: 1})

So what this doing is subtracting the counts of corresponding elements, so the elements contained in the first counter, thus order here is important. So going back to the proposed example subtracting both lists would yield:

 sub = Counter(list1) - Counter(list2)
# Counter({'sot': 1, 'ts': 1})

Now we simple need to count the values in all the keys, which can be done with:

sum(sub.values())
# 2



回答2:


You can use collections.Counter for this, where you count how many items both lists have in them, and take the difference between them.

from collections import Counter
def func(list1, list2):
    #Convert both list to counters, and subtract them
    c = Counter(list1) - Counter(list2)

    #Sum up all values in the new counter
    return sum(c.values())

The outputs are

list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
list2 = ['gg', 'gg', 'gg', 'gg', 'gg', 'sot']
print(func(list1, list2))
#2

list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
list2 = ['gg', 'gg', 'gg', 'gg', 'sot', 'sot']
print(func(list1, list2))
#1

list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
list2 = ['ts', 'ts', 'ts', 'ts', 'ts', 'ts']
print(func(list1, list2))
#5



回答3:


You are not talking about lists here. Your problem is a multiset problem, because order doesn't matter, but you do need to know how many values you have of each type. Multisets are sometimes called bags or msets.

The Python standard library has a multiset implementation: collections.Counter(), which map unique elements to a count. Use those here:

from collections import Counter

mset1 = Counter(list1)
mset2 = Counter(list2)

# sum the total number of elements that are different between
# the two multisets
sum((mset1 - mset2).values())

Subtracting one counter from another gives you a multiset of all elements that were in the first multiset but not in the other, and sum(mset.values()) adds up to the total number of elements.

Because the inputs are always the same length and you only need to know how many elements are different, it doesn't matter in which order you subtract the multisets. You will always get the right answer, both sum((mset1 - mset2).values()) and sum((mset2 - mset1).values()) will always produce the exact same number.

That's because both multisets have N elements, of which K are different. So both multisets will have exactly K extra elements that are not in the other multiset, and have K missing elements that are present in the other set. - subtraction will give you the K extra elements in the first set that are missing in the other.

Putting this into a function:

def mset_diff(iterable1, iterable2):
    return sum((Counter(iterable1) - Counter(iterable2)).values())

and applied to your inputs:

>>> mset_diff(['sot', 'sot', 'ts', 'gg', 'gg', 'gg'], ['gg', 'gg', 'gg', 'gg', 'gg', 'sot'])
2
>>> mset_diff(['sot', 'sot', 'ts', 'gg', 'gg', 'gg'], ['gg', 'gg', 'gg', 'gg', 'sot', 'sot'])
1
>>> mset_diff(['sot', 'sot', 'ts', 'gg', 'gg', 'gg'], ['ts', 'ts', 'ts', 'ts', 'ts', 'ts'])
5

The Counter() class is a subclass of dict, counting elements is fast and efficient, and calculating the difference between two is done in O(N) linear time.




回答4:


Using set will cause problems if the difference is in how many of a certain item are present. Instead, use collections.Counter. As explained in other answers, you can create a Counter for both lists and then use - to get the difference of those and get the sum of the values. Note, however, that this will only work if the lists have the same size. If the lists do not have the same number of elements, you will get a different number of diverging elements depending on which list is subtracted from which.

With subtract, on the other hand, you will get the difference in both directions, using positive numbers for items that are "too many" negative for "too few". This means, that you may have to divide the result by 2, i.e. sum(...) / 2, but it should work better for differently sized lists.

>>> list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
>>> list2 = ['gg', 'gg', 'gg', 'gg', 'sot', 'sot']
>>> c = Counter(list1)
>>> c.subtract(Counter(list2))
# Counter({'gg': -1, 'sot': 0, 'ts': 1})
>>> sum(map(abs, c.values()))
2

Another possibility, that also works reliably with differently sized lists, is using & to get the common elements and them comparing those to the total number of elements in the larger list:

>>> list1 = [1,1,1,1,2]
>>> list2 = [2]
>>> Counter(list1) & Counter(list2)
Counter({2: 1})
>>> max(len(list1), len(list2)) - sum((Counter(list1) & Counter(list2)).values())
4


来源:https://stackoverflow.com/questions/56128423/python3-find-how-many-differences-are-in-2-lists-in-order-to-be-equal

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!