问题

I am trying to construct this function but I can't work out how to stop the function counting the same duplicate more than once. Can someone help me please?

def count_duplicates(seq): 

    '''takes as argument a sequence and
    returns the number of duplicate elements'''

    fir = 0
    sec = 1
    count = 0
    while fir < len(seq):
        while sec < len(seq):
            if seq[fir] == seq[sec]:
                count = count + 1
            sec = sec + 1
        fir = fir + 1
        sec = fir + 1
    return count

In: count_duplicates([-1,2,4,2,0,4,4])

Out: 4

It fails here because the output should be 3.

回答1:

You can just create a set from your list that would automatically remove the duplicates and then calculate the difference of the lengths of the created set and the original list. Like so:

def count_duplicates(seq): 

    '''takes as argument a sequence and
    returns the number of duplicate elements'''

    return len(seq) - len(set(seq))

res = count_duplicates([-1,2,4,2,0,4,4])
print(res)  # -> 3

If you are not allowed or don't want to use any built-in shortcuts (for whatever reason), you can take the long(er) way around:

def count_duplicates2(seq): 

    '''takes as argument a sequence and
    returns the number of duplicate elements'''

    counter = 0
    seen = set()
    for elm in seq:
        if elm in seen:
            counter += 1
        else:
            seen.add(elm)
    return counter

res = count_duplicates2([-1,2,4,2,0,4,4])
print(res)  # -> 3

Finally, as far as your code is concerned, the problems with it are outlined very nicely by @AlanB in his answer. I chose not to bother correcting your code because in my mind this is an XY Problem. It is obvious that you have some kind of programming background but your convoluted while loops is just not the way things are done in Python.

回答2:

The solution of Ev. Kounis is the simplest and what you should use in my humble opinion. However, if you want to stick to your code, here is why it doesn't work:

With your intricate while loops you basically say "for every item in my list, increment count when you find a duplicate", which is basically what you want. But since you have two "4 duplicates", it increments count an extra time.

seq=[-1,2,4,2,0,4,4]
fir = 0
sec = 0
count = 0
print "Pairs of duplicates: "
for fir, item1 in enumerate(seq):
    for sec, item2 in enumerate(seq):
        if fir < sec and seq[fir] == seq[sec] :
            count+=1
            print(fir, sec)

print "Number of duplicates: ", count

Which outputs :

Pairs of duplicates: 
(1, 3)
(2, 5)
(2, 6)
(5, 6)
Number of duplicates:  4

The (5,6) pair is incorrect.

To fix this, simply add a condition to your if statement that prevents an item to be compared twice:

seq=[-1,2,4,2,0,4,4]
fir = 0
sec = 0
count = 0
duplicates=[]
print "Pairs of duplicates: "
for fir, item1 in enumerate(seq):
    for sec, item2 in enumerate(seq):
        if fir < sec and seq[fir] == seq[sec] and seq[fir] not in duplicates:
            count+=1
            print(fir, sec)

    duplicates.append(seq[fir])

print "Number of duplicates: ", count

Which outputs the desired result:

Pairs of duplicates: 
(1, 3)
(2, 5)
(2, 6)
Number of duplicates:  3

But again, doing

len(seq)-len(set(seq))

is a lot simpler and works just as well.

EDIT:

I realized I didn't use while loops in my example.

def count_duplicates(seq): 

    fir = 0
    sec = 0
    count = 0
    duplicates=[]
    print "Pairs of duplicates: "
    while fir < len(seq):
        while sec < len(seq):
            if fir < sec and seq[fir] == seq[sec] and seq[fir] not in duplicates:
                count += 1
                print(fir, sec)
            sec += 1
        duplicates.append(seq[fir])
        fir += 1
        sec = 0
    return count 


c=count_duplicates([-1,2,4,2,0,4,4])
print "Number of duplicates: ", c

回答3:

Approach using Pandas. This Approach is suitable for large list(s) with duplicates.

data = [-1,2,4,2,0,4,4]
import pandas as pd
df = pd.DataFrame({'data':data}) #Loading the data as Data Frame
print(df[df1==False]) #Printing Non-Duplicated Values
   data
0    -1
1     2
2     4
4     0
print(df[df1==False].count()) #Taking count of Non-Duplicate Values
data    4
dtype: int64

来源：https://stackoverflow.com/questions/52090212/counting-the-number-of-duplicates-in-a-list

标签

python