Counting the number of duplicates in a list [duplicate]

て烟熏妆下的殇ゞ 提交于 2020-04-18 15:54:31

问题


I am trying to construct this function but I can't work out how to stop the function counting the same duplicate more than once. Can someone help me please?

def count_duplicates(seq): 

    '''takes as argument a sequence and
    returns the number of duplicate elements'''

    fir = 0
    sec = 1
    count = 0
    while fir < len(seq):
        while sec < len(seq):
            if seq[fir] == seq[sec]:
                count = count + 1
            sec = sec + 1
        fir = fir + 1
        sec = fir + 1
    return count 

In: count_duplicates([-1,2,4,2,0,4,4])

Out: 4

It fails here because the output should be 3.


回答1:


You can just create a set from your list that would automatically remove the duplicates and then calculate the difference of the lengths of the created set and the original list. Like so:

def count_duplicates(seq): 

    '''takes as argument a sequence and
    returns the number of duplicate elements'''

    return len(seq) - len(set(seq))

res = count_duplicates([-1,2,4,2,0,4,4])
print(res)  # -> 3

If you are not allowed or don't want to use any built-in shortcuts (for whatever reason), you can take the long(er) way around:

def count_duplicates2(seq): 

    '''takes as argument a sequence and
    returns the number of duplicate elements'''

    counter = 0
    seen = set()
    for elm in seq:
        if elm in seen:
            counter += 1
        else:
            seen.add(elm)
    return counter

res = count_duplicates2([-1,2,4,2,0,4,4])
print(res)  # -> 3

Finally, as far as your code is concerned, the problems with it are outlined very nicely by @AlanB in his answer. I chose not to bother correcting your code because in my mind this is an XY Problem. It is obvious that you have some kind of programming background but your convoluted while loops is just not the way things are done in Python.




回答2:


The solution of Ev. Kounis is the simplest and what you should use in my humble opinion. However, if you want to stick to your code, here is why it doesn't work:

With your intricate while loops you basically say "for every item in my list, increment count when you find a duplicate", which is basically what you want. But since you have two "4 duplicates", it increments count an extra time.

seq=[-1,2,4,2,0,4,4]
fir = 0
sec = 0
count = 0
print "Pairs of duplicates: "
for fir, item1 in enumerate(seq):
    for sec, item2 in enumerate(seq):
        if fir < sec and seq[fir] == seq[sec] :
            count+=1
            print(fir, sec)

print "Number of duplicates: ", count 

Which outputs :

Pairs of duplicates: 
(1, 3)
(2, 5)
(2, 6)
(5, 6)
Number of duplicates:  4

The (5,6) pair is incorrect.

To fix this, simply add a condition to your if statement that prevents an item to be compared twice:

seq=[-1,2,4,2,0,4,4]
fir = 0
sec = 0
count = 0
duplicates=[]
print "Pairs of duplicates: "
for fir, item1 in enumerate(seq):
    for sec, item2 in enumerate(seq):
        if fir < sec and seq[fir] == seq[sec] and seq[fir] not in duplicates:
            count+=1
            print(fir, sec)

    duplicates.append(seq[fir])

print "Number of duplicates: ", count

Which outputs the desired result:

Pairs of duplicates: 
(1, 3)
(2, 5)
(2, 6)
Number of duplicates:  3

But again, doing

len(seq)-len(set(seq))

is a lot simpler and works just as well.

EDIT:

I realized I didn't use while loops in my example.

def count_duplicates(seq): 

    fir = 0
    sec = 0
    count = 0
    duplicates=[]
    print "Pairs of duplicates: "
    while fir < len(seq):
        while sec < len(seq):
            if fir < sec and seq[fir] == seq[sec] and seq[fir] not in duplicates:
                count += 1
                print(fir, sec)
            sec += 1
        duplicates.append(seq[fir])
        fir += 1
        sec = 0
    return count 


c=count_duplicates([-1,2,4,2,0,4,4])
print "Number of duplicates: ", c



回答3:


Approach using Pandas. This Approach is suitable for large list(s) with duplicates.

data = [-1,2,4,2,0,4,4]
import pandas as pd
df = pd.DataFrame({'data':data}) #Loading the data as Data Frame
print(df[df1==False]) #Printing Non-Duplicated Values
   data
0    -1
1     2
2     4
4     0
print(df[df1==False].count()) #Taking count of Non-Duplicate Values
data    4
dtype: int64


来源:https://stackoverflow.com/questions/52090212/counting-the-number-of-duplicates-in-a-list

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!