问题
Assuming we got 2 lists, always with the same length and always containing strings.
list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
list2 = ['gg', 'gg', 'gg', 'gg', 'gg', 'sot']
we need to find:
How many items of the list2
should change, in order for it to be equals with list1
.
So on the previous example it should return 2
For this example:
list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
list2 = ['gg', 'gg', 'gg', 'gg', 'sot', 'sot']
it should return 1
and finally for this example:
list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
list2 = ['ts', 'ts', 'ts', 'ts', 'ts', 'ts']
it should return 5
.
We do not care about which elements should change to what. We neither care about the order, so that means that
['gg', 'gg', 'gg', 'gg', 'gg', 'sot']
and
['gg', 'gg', 'sot', 'gg', 'gg', 'gg']
are equal and the result of them should be 0.
The length of the lists could be 6, 8, 20 or whatever and sometimes there are more elements in place.
I tried a lot of things like set(list1) - set(list2)
,list(set(list1).difference(list2))
, set(list1).symmetric_difference(set(list2))
but without any success.
回答1:
You could leverage the many possibilities Counter offers:
list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
list2 = ['gg', 'gg', 'gg', 'gg', 'gg', 'sot']
from collections import Counter
sum((Counter(list1) - Counter(list2)).values())
# 2
Lets check with the other examples:
list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
list2 = ['gg', 'gg', 'gg', 'gg', 'sot', 'sot']
sum((Counter(list1) - Counter(list2)).values())
# 1
list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
list2 = ['ts', 'ts', 'ts', 'ts', 'ts', 'ts']
sum((Counter(list1) - Counter(list2)).values())
# 5
list1 = ['gg', 'gg', 'gg', 'gg', 'gg', 'sot']
list2 = ['gg', 'gg', 'sot', 'gg', 'gg', 'gg']
sum((Counter(list1) - Counter(list2)).values())
# 0
Details
By using Counter
, you will have a count of all elements from each list in the form of a dictionary. Lets go back to the first example:
c1 = Counter(list1)
# Counter({'sot': 2, 'ts': 1, 'gg': 3})
c2 = Counter(list2)
# Counter({'gg': 5, 'sot': 1})
Now we somehow would like to get an understanding of:
Which items are present in
list1
but not inlist2
Out of those that are present and also those there are not, how many more are needed in
list2
so that they contain the same amount of counts
Well we could take advantage of the fact that counters support mathematical operations, the result of which produces multisets
, i.e counters that have counts greater than zero. So given that we're looking for the difference between both counters it seems like we could subtract them and see what elements and their respective counts are needed in list2
.
So how would subtraction between Counters work? Lets check with a simple example:
Counter({1:4, 2: 1}) - Counter({1:1, 3:1})
# Counter({1: 3, 2: 1})
So what this doing is subtracting the counts of corresponding elements, so the elements contained in the first counter, thus order here is important. So going back to the proposed example subtracting both lists would yield:
sub = Counter(list1) - Counter(list2)
# Counter({'sot': 1, 'ts': 1})
Now we simple need to count the values
in all the keys
, which can be done with:
sum(sub.values())
# 2
回答2:
You can use collections.Counter
for this, where you count how many items both lists have in them, and take the difference between them.
from collections import Counter
def func(list1, list2):
#Convert both list to counters, and subtract them
c = Counter(list1) - Counter(list2)
#Sum up all values in the new counter
return sum(c.values())
The outputs are
list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
list2 = ['gg', 'gg', 'gg', 'gg', 'gg', 'sot']
print(func(list1, list2))
#2
list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
list2 = ['gg', 'gg', 'gg', 'gg', 'sot', 'sot']
print(func(list1, list2))
#1
list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
list2 = ['ts', 'ts', 'ts', 'ts', 'ts', 'ts']
print(func(list1, list2))
#5
回答3:
You are not talking about lists here. Your problem is a multiset problem, because order doesn't matter, but you do need to know how many values you have of each type. Multisets are sometimes called bags or msets.
The Python standard library has a multiset implementation: collections.Counter(), which map unique elements to a count. Use those here:
from collections import Counter
mset1 = Counter(list1)
mset2 = Counter(list2)
# sum the total number of elements that are different between
# the two multisets
sum((mset1 - mset2).values())
Subtracting one counter from another gives you a multiset of all elements that were in the first multiset but not in the other, and sum(mset.values())
adds up to the total number of elements.
Because the inputs are always the same length and you only need to know how many elements are different, it doesn't matter in which order you subtract the multisets. You will always get the right answer, both sum((mset1 - mset2).values())
and sum((mset2 - mset1).values())
will always produce the exact same number.
That's because both multisets have N elements, of which K are different. So both multisets will have exactly K extra elements that are not in the other multiset, and have K missing elements that are present in the other set. -
subtraction will give you the K extra elements in the first set that are missing in the other.
Putting this into a function:
def mset_diff(iterable1, iterable2):
return sum((Counter(iterable1) - Counter(iterable2)).values())
and applied to your inputs:
>>> mset_diff(['sot', 'sot', 'ts', 'gg', 'gg', 'gg'], ['gg', 'gg', 'gg', 'gg', 'gg', 'sot'])
2
>>> mset_diff(['sot', 'sot', 'ts', 'gg', 'gg', 'gg'], ['gg', 'gg', 'gg', 'gg', 'sot', 'sot'])
1
>>> mset_diff(['sot', 'sot', 'ts', 'gg', 'gg', 'gg'], ['ts', 'ts', 'ts', 'ts', 'ts', 'ts'])
5
The Counter()
class is a subclass of dict
, counting elements is fast and efficient, and calculating the difference between two is done in O(N) linear time.
回答4:
Using set
will cause problems if the difference is in how many of a certain item are present. Instead, use collections.Counter. As explained in other answers, you can create a Counter
for both lists and then use -
to get the difference of those and get the sum
of the values
.
Note, however, that this will only work if the lists have the same size. If the lists do not have the same number of elements, you will get a different number of diverging elements depending on which list is subtracted from which.
With subtract
, on the other hand, you will get the difference in both directions, using positive numbers for items that are "too many" negative for "too few". This means, that you may have to divide the result by 2, i.e. sum(...) / 2
, but it should work better for differently sized lists.
>>> list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
>>> list2 = ['gg', 'gg', 'gg', 'gg', 'sot', 'sot']
>>> c = Counter(list1)
>>> c.subtract(Counter(list2))
# Counter({'gg': -1, 'sot': 0, 'ts': 1})
>>> sum(map(abs, c.values()))
2
Another possibility, that also works reliably with differently sized lists, is using &
to get the common elements and them comparing those to the total number of elements in the larger list:
>>> list1 = [1,1,1,1,2]
>>> list2 = [2]
>>> Counter(list1) & Counter(list2)
Counter({2: 1})
>>> max(len(list1), len(list2)) - sum((Counter(list1) & Counter(list2)).values())
4
来源:https://stackoverflow.com/questions/56128423/python3-find-how-many-differences-are-in-2-lists-in-order-to-be-equal