Following example:
string1 = "calvin klein design dress calvin klein"
How can I remove the second two duplicates "calvin" and "klein"?
The result should look like
string2 = "calvin klein design dress"
only the second duplicates should be removed and the sequence of the words should not be changed!
def unique_list(l):
ulist = []
[ulist.append(x) for x in l if x not in ulist]
return ulist
a="calvin klein design dress calvin klein"
a=' '.join(unique_list(a.split()))
string1 = "calvin klein design dress calvin klein"
words = string1.split()
print (" ".join(sorted(set(words), key=words.index)))
This sorts the set of all the (unique) words in your string by the word's index in the original list of words.
In Python 2.7+, you could use collections.OrderedDict for this:
from collections import OrderedDict
s = "calvin klein design dress calvin klein"
print ' '.join(OrderedDict((w,w) for w in s.split()).keys())
Cut and paste from the itertools recipes
from itertools import ifilterfalse
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in ifilterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
I really wish they could go ahead and make a module out of those recipes soon. I'd very much like to be able to do from itertools_recipes import unique_everseen instead of using cut-and-paste every time I need something.
Use like this:
def unique_words(string, ignore_case=False):
key = None
if ignore_case:
key = str.lower
return " ".join(unique_everseen(string.split(), key=key))
string2 = unique_words(string1)
string = 'calvin klein design dress calvin klein'
def uniquify(string):
output = []
seen = set()
for word in string.split():
if word not in seen:
output.append(word)
seen.add(word)
return ' '.join(output)
print uniquify(string)
You can use a set to keep track of already processed words.
words = set()
result = ''
for word in string1.split():
if word not in words:
result = result + word + ' '
words.add(word)
print result
Several answers are pretty close to this but haven't quite ended up where I did:
def uniques( your_string ):
seen = set()
return ' '.join( seen.add(i) or i for i in your_string.split() if i not in seen )
Of course, if you want it a tiny bit cleaner or faster, we can refactor a bit:
def uniques( your_string ):
words = your_string.split()
seen = set()
seen_add = seen.add
def add(x):
seen_add(x)
return x
return ' '.join( add(i) for i in words if i not in seen )
I think the second version is about as performant as you can get in a small amount of code. (More code could be used to do all the work in a single scan across the input string but for most workloads, this should be sufficient.)
11 and 2 work perfectly:
s="the sky is blue very blue"
s=s.lower()
slist = s.split()
print " ".join(sorted(set(slist), key=slist.index))
and 2
s="the sky is blue very blue"
s=s.lower()
slist = s.split()
print " ".join(sorted(set(slist), key=slist.index))
Question: Remove the duplicates in a string
from _collections import OrderedDict
a = "Gina Gini Gini Protijayi"
aa = OrderedDict().fromkeys(a.split())
print(' '.join(aa))
# output => Gina Gini Protijayi
You can remove duplicate or repeated words from a text file or string using following codes -
from collections import Counter
for lines in all_words:
line=''.join(lines.lower())
new_data1=' '.join(lemmatize_sentence(line))
new_data2 = word_tokenize(new_data1)
new_data3=nltk.pos_tag(new_data2)
# below code is for removal of repeated words
for i in range(0, len(new_data3)):
new_data3[i] = "".join(new_data3[i])
UniqW = Counter(new_data3)
new_data5 = " ".join(UniqW.keys())
print (new_data5)
new_data.append(new_data5)
print (new_data)
P.S. -Do identations as per required. Hope this helps!!!
You can do that simply by getting the set associated to the string, which is a mathematical object containing no repeated elements by definition. It suffices to join the words in the set back into a string:
def remove_duplicate_words(string):
return ' '.join(set(string.split()))
string2 = ' '.join(set(string1.split()))
Explanation:
.split() - it is a method to split string to list (without params it split by spaces)set() - it is type of unordered collections that exclude dublicates'separator'.join(list) - mean that you want to join list from params to string with 'separator' between elements
来源:https://stackoverflow.com/questions/7794208/how-can-i-remove-duplicate-words-in-a-string-with-python