How to eliminate duplicate list entries in Python while preserving case-sensitivity?

 ̄綄美尐妖づ 提交于 2019-12-05 00:09:54

问题


I'm looking for a way to remove duplicate entries from a Python list but with a twist; The final list has to be case sensitive with a preference of uppercase words.

For example, between cup and Cup I only need to keep Cup and not cup. Unlike other common solutions which suggest using lower() first, I'd prefer to maintain the string's case here and in particular I'd prefer keeping the one with the uppercase letter over the one which is lowercase..

Again, I am trying to turn this list: [Hello, hello, world, world, poland, Poland]

into this:

[Hello, world, Poland]

How should I do that?

Thanks in advance.


回答1:


This does not preserve the order of words, but it does produce a list of "unique" words with a preference for capitalized ones.

In [34]: words = ['Hello', 'hello', 'world', 'world', 'poland', 'Poland', ]

In [35]: wordset = set(words)

In [36]: [item for item in wordset if item.istitle() or item.title() not in wordset]
Out[36]: ['world', 'Poland', 'Hello']

If you wish to preserve the order as they appear in words, then you could use a collections.OrderedDict:

In [43]: wordset = collections.OrderedDict()

In [44]: wordset = collections.OrderedDict.fromkeys(words)

In [46]: [item for item in wordset if item.istitle() or item.title() not in wordset]
Out[46]: ['Hello', 'world', 'Poland']



回答2:


Using set to track seen words:

def uniq(words):
    seen = set()
    for word in words:
        l = word.lower()  # Use `word.casefold()` if possible. (3.3+)
        if l in seen:
            continue
        seen.add(l)
        yield word

Usage:

>>> list(uniq(['Hello', 'hello', 'world', 'world', 'Poland', 'poland']))
['Hello', 'world', 'Poland']

UPDATE

Previous version does not take care of preference of uppercase over lowercase. In the updated version I used the min as @TheSoundDefense did.

import collections

def uniq(words):
    seen = collections.OrderedDict()  # Use {} if the order is not important.
    for word in words:
        l = word.lower()  # Use `word.casefold()` if possible (3.3+)
        seen[l] = min(word, seen.get(l, word))
    return seen.values()



回答3:


Since an uppercase letter is "smaller" than a lowercase letter in a comparison, I think you can do this:

orig_list = ["Hello", "hello", "world", "world", "Poland", "poland"]
unique_list = []
for word in orig_list:
  for i in range(len(unique_list)):
    if unique_list[i].lower() == word.lower():
      unique_list[i] = min(word, unique_list[i])
      break
  else:
    unique_list.append(word)

The min will have a preference for words with uppercase letters earlier on.




回答4:


Some better answers here, but hopefully something simple, different and useful. This code satisfies the conditions of your test, sequential pairs of matching words, but would fail on anything more complicated; such as non-sequential pairs, non-pairs or non-strings. Anything more complicated and I'd take a different approach.

p1 = ['Hello', 'hello', 'world', 'world', 'Poland', 'poland']
p2 = ['hello', 'Hello', 'world', 'world', 'Poland', 'Poland']

def pref_upper(p):
    q = []
    a = 0
    b = 1

    for x in range(len(p) /2):
            if p[a][0].isupper() and p[b][0].isupper():
                    q.append(p[a])
            if p[a][0].isupper() and p[b][0].islower():
                    q.append(p[a])
            if p[a][0].islower() and p[b][0].isupper():
                    q.append(p[b])
            if p[a][0].islower() and p[b][0].islower():
                    q.append(p[b])
            a +=2
            b +=2
    return q

print pref_upper(p1)
print pref_upper(p2)


来源:https://stackoverflow.com/questions/24983172/how-to-eliminate-duplicate-list-entries-in-python-while-preserving-case-sensitiv

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!