Python: Rename duplicates in list with progressive numbers without sorting list

本秂侑毒 提交于 2019-12-03 07:46:08

问题


Given a list like this:

mylist = ["name", "state", "name", "city", "name", "zip", "zip"]

I would like to rename the duplicates by appending a number to get the following result:

mylist = ["name1", "state", "name2", "city", "name3", "zip1", "zip2"]

I do not want to change the order of the original list. The solutions suggested for this related Stack Overflow question sorts the list, which I do not want to do.


回答1:


This is how I would do it. EDIT: I wrote this into a more generalized utility function since people seem to like this answer.

mylist = ["name", "state", "name", "city", "name", "zip", "zip"]
check = ["name1", "state", "name2", "city", "name3", "zip1", "zip2"]
copy = mylist[:]  # so we will only mutate the copy in case of failure

from collections import Counter # Counter counts the number of occurrences of each item
from itertools import tee, count

def uniquify(seq, suffs = count(1)):
    """Make all the items unique by adding a suffix (1, 2, etc).

    `seq` is mutable sequence of strings.
    `suffs` is an optional alternative suffix iterable.
    """
    not_unique = [k for k,v in Counter(seq).items() if v>1] # so we have: ['name', 'zip']
    # suffix generator dict - e.g., {'name': <my_gen>, 'zip': <my_gen>}
    suff_gens = dict(zip(not_unique, tee(suffs, len(not_unique))))  
    for idx,s in enumerate(seq):
        try:
            suffix = str(next(suff_gens[s]))
        except KeyError:
            # s was unique
            continue
        else:
            seq[idx] += suffix

uniquify(copy)
assert copy==check  # raise an error if we failed
mylist = copy  # success

If you wanted to append an underscore before each count, you could do something like this:

>>> mylist = ["name", "state", "name", "city", "name", "zip", "zip"]
>>> uniquify(mylist, (f'_{x!s}' for x in range(1, 100)))
>>> mylist
['name_1', 'state', 'name_2', 'city', 'name_3', 'zip_1', 'zip_2']

...or if you wanted to use letters instead:

>>> mylist = ["name", "state", "name", "city", "name", "zip", "zip"]
>>> import string
>>> uniquify(mylist, (f'_{x!s}' for x in string.ascii_lowercase))
>>> mylist
['name_a', 'state', 'name_b', 'city', 'name_c', 'zip_a', 'zip_b']

NOTE: this is not the fastest possible algorithm; for that, refer to the answer by ronakg. The advantage of the function above is it is easy to understand and read, and you're not going to see much of a performance difference unless you have an extremely large list.

EDIT: Here is my original answer in a one-liner, however the order is not preserved and it uses the .index method, which is extremely suboptimal (as explained in the answer by DTing). See the answer by queezz for a nice 'two-liner' that preserves order.

[s + str(suffix) if num>1 else s for s,num in Counter(mylist).items() for suffix in range(1, num+1)]
# Produces: ['zip1', 'zip2', 'city', 'state', 'name1', 'name2', 'name3']



回答2:


My solution with map and lambda:

print map(lambda x: x[1] + str(mylist[:x[0]].count(x[1]) + 1) if mylist.count(x[1]) > 1 else x[1], enumerate(mylist))

More traditional form

newlist = []
for i, v in enumerate(mylist):
    totalcount = mylist.count(v)
    count = mylist[:i].count(v)
    newlist.append(v + str(count + 1) if totalcount > 1 else v)

And last one

[v + str(mylist[:i].count(v) + 1) if mylist.count(v) > 1 else v for i, v in enumerate(mylist)]



回答3:


Here's a very simple O(n) solution. Simply walk the list storing the index of element in the list. If we've seen this element before, use the stored data earlier to append the occurrence value.

This approach solves the problem with just creating one more dictionary for look-back. Avoids doing look-ahead so that we don't create temporary list slices.

mylist = ["name", "state", "name", "city", "city", "name", "zip", "zip", "name"]

dups = {}

for i, val in enumerate(mylist):
    if val not in dups:
        # Store index of first occurrence and occurrence value
        dups[val] = [i, 1]
    else:
        # Special case for first occurrence
        if dups[val][1] == 1:
            mylist[dups[val][0]] += str(dups[val][1])

        # Increment occurrence value, index value doesn't matter anymore
        dups[val][1] += 1

        # Use stored occurrence value
        mylist[i] += str(dups[val][1])

print mylist

# ['name1', 'state', 'name2', 'city1', 'city2', 'name3', 'zip1', 'zip2', 'name4']



回答4:


Any method where count is called on each element is going to result in O(n^2) since count is O(n). You can do something like this:

# not modifying original list
from collections import Counter

mylist = ["name", "state", "name", "city", "name", "zip", "zip"]
counts = {k:v for k,v in Counter(mylist).items() if v > 1}
newlist = mylist[:]

for i in reversed(range(len(mylist))):
    item = mylist[i]
    if item in counts and counts[item]:
        newlist[i] += str(counts[item])
        counts[item]-=1
print(newlist)

# ['name1', 'state', 'name2', 'city', 'name3', 'zip1', 'zip2']

# modifying original list
from collections import Counter

mylist = ["name", "state", "name", "city", "name", "zip", "zip"]
counts = {k:v for k,v in Counter(mylist).items() if v > 1}      

for i in reversed(range(len(mylist))):
    item = mylist[i]
    if item in counts and counts[item]:
        mylist[i] += str(counts[item])
        counts[item]-=1
print(mylist)

# ['name1', 'state', 'name2', 'city', 'name3', 'zip1', 'zip2']

This should be O(n).

Other provided answers:

mylist.index(s) per element causes O(n^2)

mylist = ["name", "state", "name", "city", "name", "zip", "zip"]

from collections import Counter
counts = Counter(mylist)
for s,num in counts.items():
    if num > 1:
        for suffix in range(1, num + 1):
            mylist[mylist.index(s)] = s + str(suffix) 

count(x[1]) per element causes O(n^2)
It is also used multiple times per element along with list slicing.

print map(lambda x: x[1] + str(mylist[:x[0]].count(x[1]) + 1) if mylist.count(x[1]) > 1 else x[1], enumerate(mylist))

Benchmarks:

http://nbviewer.ipython.org/gist/dting/c28fb161de7b6287491b




回答5:


A list comprehension version of the Rick Teachey answer, "two-liner":

from collections import Counter

m = ["name", "state", "name", "city", "name", "zip", "zip"]

d = {a:list(range(1, b+1)) if b>1 else '' for a,b in Counter(m).items()}
[i+str(d[i].pop(0)) if len(d[i]) else i for i in m]
#['name1', 'state', 'name2', 'city', 'name3', 'zip1', 'zip2']



回答6:


You can use hashtable to solve this problem. Define a dictionary d. key is the string and value is (first_time_index_in_the_list, times_of_appearance). Everytime when you see a word, just check the dictionary, and if the value is 2, use first_time_index_in_the_list to append '1' to the first element, and append times_of_appearance to current element. If greater than 2, just append times_of_appearance to current element.




回答7:


Less fancy stuff.

from collections import defaultdict
mylist = ["name", "state", "name", "city", "name", "zip", "zip"]
finalList = []
dictCount = defaultdict(int)
anotherDict = defaultdict(int)
for t in mylist:
   anotherDict[t] += 1
for m in mylist:
   dictCount[m] += 1
   if anotherDict[m] > 1:
       finalList.append(str(m)+str(dictCount[m]))
   else:
       finalList.append(m)
print finalList


来源:https://stackoverflow.com/questions/30650474/python-rename-duplicates-in-list-with-progressive-numbers-without-sorting-list

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!