How to eliminate duplicate list entries in Python while preserving case-sensitivity?

自作多情 提交于 2019-12-03 14:38:42

This does not preserve the order of words, but it does produce a list of "unique" words with a preference for capitalized ones.

In [34]: words = ['Hello', 'hello', 'world', 'world', 'poland', 'Poland', ]

In [35]: wordset = set(words)

In [36]: [item for item in wordset if item.istitle() or item.title() not in wordset]
Out[36]: ['world', 'Poland', 'Hello']

If you wish to preserve the order as they appear in words, then you could use a collections.OrderedDict:

In [43]: wordset = collections.OrderedDict()

In [44]: wordset = collections.OrderedDict.fromkeys(words)

In [46]: [item for item in wordset if item.istitle() or item.title() not in wordset]
Out[46]: ['Hello', 'world', 'Poland']

Using set to track seen words:

def uniq(words):
    seen = set()
    for word in words:
        l = word.lower()  # Use `word.casefold()` if possible. (3.3+)
        if l in seen:
            continue
        seen.add(l)
        yield word

Usage:

>>> list(uniq(['Hello', 'hello', 'world', 'world', 'Poland', 'poland']))
['Hello', 'world', 'Poland']

UPDATE

Previous version does not take care of preference of uppercase over lowercase. In the updated version I used the min as @TheSoundDefense did.

import collections

def uniq(words):
    seen = collections.OrderedDict()  # Use {} if the order is not important.
    for word in words:
        l = word.lower()  # Use `word.casefold()` if possible (3.3+)
        seen[l] = min(word, seen.get(l, word))
    return seen.values()

Since an uppercase letter is "smaller" than a lowercase letter in a comparison, I think you can do this:

orig_list = ["Hello", "hello", "world", "world", "Poland", "poland"]
unique_list = []
for word in orig_list:
  for i in range(len(unique_list)):
    if unique_list[i].lower() == word.lower():
      unique_list[i] = min(word, unique_list[i])
      break
  else:
    unique_list.append(word)

The min will have a preference for words with uppercase letters earlier on.

Some better answers here, but hopefully something simple, different and useful. This code satisfies the conditions of your test, sequential pairs of matching words, but would fail on anything more complicated; such as non-sequential pairs, non-pairs or non-strings. Anything more complicated and I'd take a different approach.

p1 = ['Hello', 'hello', 'world', 'world', 'Poland', 'poland']
p2 = ['hello', 'Hello', 'world', 'world', 'Poland', 'Poland']

def pref_upper(p):
    q = []
    a = 0
    b = 1

    for x in range(len(p) /2):
            if p[a][0].isupper() and p[b][0].isupper():
                    q.append(p[a])
            if p[a][0].isupper() and p[b][0].islower():
                    q.append(p[a])
            if p[a][0].islower() and p[b][0].isupper():
                    q.append(p[b])
            if p[a][0].islower() and p[b][0].islower():
                    q.append(p[b])
            a +=2
            b +=2
    return q

print pref_upper(p1)
print pref_upper(p2)
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!