问题
I'm looking for a way to remove duplicate entries from a Python list but with a twist; The final list has to be case sensitive with a preference of uppercase words.
For example, between cup
and Cup
I only need to keep Cup
and not cup
. Unlike other common solutions which suggest using lower()
first, I'd prefer to maintain the string's case here and in particular I'd prefer keeping the one with the uppercase letter over the one which is lowercase..
Again, I am trying to turn this list:
[Hello, hello, world, world, poland, Poland]
into this:
[Hello, world, Poland]
How should I do that?
Thanks in advance.
回答1:
This does not preserve the order of words
, but it does produce a list of "unique" words with a preference for capitalized ones.
In [34]: words = ['Hello', 'hello', 'world', 'world', 'poland', 'Poland', ]
In [35]: wordset = set(words)
In [36]: [item for item in wordset if item.istitle() or item.title() not in wordset]
Out[36]: ['world', 'Poland', 'Hello']
If you wish to preserve the order as they appear in words
, then you could use a collections.OrderedDict:
In [43]: wordset = collections.OrderedDict()
In [44]: wordset = collections.OrderedDict.fromkeys(words)
In [46]: [item for item in wordset if item.istitle() or item.title() not in wordset]
Out[46]: ['Hello', 'world', 'Poland']
回答2:
Using set to track seen words:
def uniq(words):
seen = set()
for word in words:
l = word.lower() # Use `word.casefold()` if possible. (3.3+)
if l in seen:
continue
seen.add(l)
yield word
Usage:
>>> list(uniq(['Hello', 'hello', 'world', 'world', 'Poland', 'poland']))
['Hello', 'world', 'Poland']
UPDATE
Previous version does not take care of preference of uppercase over lowercase. In the updated version I used the min
as @TheSoundDefense did.
import collections
def uniq(words):
seen = collections.OrderedDict() # Use {} if the order is not important.
for word in words:
l = word.lower() # Use `word.casefold()` if possible (3.3+)
seen[l] = min(word, seen.get(l, word))
return seen.values()
回答3:
Since an uppercase letter is "smaller" than a lowercase letter in a comparison, I think you can do this:
orig_list = ["Hello", "hello", "world", "world", "Poland", "poland"]
unique_list = []
for word in orig_list:
for i in range(len(unique_list)):
if unique_list[i].lower() == word.lower():
unique_list[i] = min(word, unique_list[i])
break
else:
unique_list.append(word)
The min
will have a preference for words with uppercase letters earlier on.
回答4:
Some better answers here, but hopefully something simple, different and useful. This code satisfies the conditions of your test, sequential pairs of matching words, but would fail on anything more complicated; such as non-sequential pairs, non-pairs or non-strings. Anything more complicated and I'd take a different approach.
p1 = ['Hello', 'hello', 'world', 'world', 'Poland', 'poland']
p2 = ['hello', 'Hello', 'world', 'world', 'Poland', 'Poland']
def pref_upper(p):
q = []
a = 0
b = 1
for x in range(len(p) /2):
if p[a][0].isupper() and p[b][0].isupper():
q.append(p[a])
if p[a][0].isupper() and p[b][0].islower():
q.append(p[a])
if p[a][0].islower() and p[b][0].isupper():
q.append(p[b])
if p[a][0].islower() and p[b][0].islower():
q.append(p[b])
a +=2
b +=2
return q
print pref_upper(p1)
print pref_upper(p2)
来源:https://stackoverflow.com/questions/24983172/how-to-eliminate-duplicate-list-entries-in-python-while-preserving-case-sensitiv