Python: Rename duplicates in list with progressive numbers without sorting list

前端 未结 7 2004
情书的邮戳
情书的邮戳 2020-12-13 18:41

Given a list like this:

mylist = [\"name\", \"state\", \"name\", \"city\", \"name\", \"zip\", \"zip\"]

I would like to rename the duplicate

7条回答
  •  天涯浪人
    2020-12-13 19:44

    This is how I would do it. EDIT: I wrote this into a more generalized utility function since people seem to like this answer.

    mylist = ["name", "state", "name", "city", "name", "zip", "zip"]
    check = ["name1", "state", "name2", "city", "name3", "zip1", "zip2"]
    copy = mylist[:]  # so we will only mutate the copy in case of failure
    
    from collections import Counter # Counter counts the number of occurrences of each item
    from itertools import tee, count
    
    def uniquify(seq, suffs = count(1)):
        """Make all the items unique by adding a suffix (1, 2, etc).
    
        `seq` is mutable sequence of strings.
        `suffs` is an optional alternative suffix iterable.
        """
        not_unique = [k for k,v in Counter(seq).items() if v>1] # so we have: ['name', 'zip']
        # suffix generator dict - e.g., {'name': , 'zip': }
        suff_gens = dict(zip(not_unique, tee(suffs, len(not_unique))))  
        for idx,s in enumerate(seq):
            try:
                suffix = str(next(suff_gens[s]))
            except KeyError:
                # s was unique
                continue
            else:
                seq[idx] += suffix
    
    uniquify(copy)
    assert copy==check  # raise an error if we failed
    mylist = copy  # success
    

    If you wanted to append an underscore before each count, you could do something like this:

    >>> mylist = ["name", "state", "name", "city", "name", "zip", "zip"]
    >>> uniquify(mylist, (f'_{x!s}' for x in range(1, 100)))
    >>> mylist
    ['name_1', 'state', 'name_2', 'city', 'name_3', 'zip_1', 'zip_2']
    

    ...or if you wanted to use letters instead:

    >>> mylist = ["name", "state", "name", "city", "name", "zip", "zip"]
    >>> import string
    >>> uniquify(mylist, (f'_{x!s}' for x in string.ascii_lowercase))
    >>> mylist
    ['name_a', 'state', 'name_b', 'city', 'name_c', 'zip_a', 'zip_b']
    

    NOTE: this is not the fastest possible algorithm; for that, refer to the answer by ronakg. The advantage of the function above is it is easy to understand and read, and you're not going to see much of a performance difference unless you have an extremely large list.

    EDIT: Here is my original answer in a one-liner, however the order is not preserved and it uses the .index method, which is extremely suboptimal (as explained in the answer by DTing). See the answer by queezz for a nice 'two-liner' that preserves order.

    [s + str(suffix) if num>1 else s for s,num in Counter(mylist).items() for suffix in range(1, num+1)]
    # Produces: ['zip1', 'zip2', 'city', 'state', 'name1', 'name2', 'name3']
    

提交回复
热议问题