How do I remove duplicate words from a list in python without using sets?

强颜欢笑 提交于 2019-12-04 19:30:45

You did have a couple logic error with your code. I fixed them, hope it helps.

fname = "stuff.txt"
fhand = open(fname)
AllWords = list()      #create new list
ResultList = list()    #create new results list I want to append words to

for line in fhand:
    line.rstrip()   #strip white space
    words = line.split()    #split lines of words and make list
    AllWords.extend(words)   #make the list from 4 lists to 1 list

AllWords.sort()  #sort list

for word in AllWords:   #for each word in line.split()
    if word not in ResultList:    #if a word isn't in line.split            
        ResultList.append(word)   #append it.


print(ResultList)

Tested on Python 3.4, no importing.

mylist = ['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'and', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'is', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'sun', 'the', 'the', 'the', 'through', 'what', 'window', 'with', 'yonder']
newlist = sorted(set(mylist), key=lambda x:mylist.index(x))
print(newlist)
['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'the', 'through', 'what', 'window', 'with', 'yonder']

newlist contains a list of the set of unique values from mylist, sorted by each item's index in mylist.

This should work, it walks the list and adds elements to a new list if they are not the same as the last element added to the new list.

def unique(lst):
    """ Assumes lst is already sorted """
    unique_list = []
    for el in lst:
        if el != unique_list[-1]:
            unique_list.append(el)
    return unique_list

You could also use collections.groupby which works similarly

from collections import groupby

# lst must already be sorted 
unique_list = [key for key, _ in groupby(lst)]

A good alternative to using a set would be to use a dictionary. The collections module contains a class called Counter which is specialized dictionary for counting the number of times each of its keys are seen. Using it you could do something like this:

from collections import Counter

wordlist = ['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'and',
            'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'is',
            'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'sun',
            'the', 'the', 'the', 'through', 'what', 'window', 'with', 'yonder']

newlist = sorted(Counter(wordlist), 
                 key=lambda w: w.lower())  # case insensitive sort
print(newlist)

Output:

['already', 'and', 'Arise', 'breaks', 'But', 'east', 'envious', 'fair',
 'grief', 'is', 'It', 'Juliet', 'kill', 'light', 'moon', 'pale', 'sick',
 'soft', 'sun', 'the', 'through', 'what', 'Who', 'window', 'with', 'yonder']

There is a problem with your code. I think you mean:

for word in line.split():   #for each word in line.split()
    if words not in ResultList:    #if a word isn't in ResultList

Use plain old lists. Almost certainly not as efficient as Counter.

fname = raw_input("Enter file name: ")  

Words = []
with open(fname) as fhand:
    for line in fhand:
        line = line.strip()
        # lines probably not needed
        #if line.startswith('"'):
        #    line = line[1:]
        #if line.endswith('"'):
        #    line = line[:-1]
        Words.extend(line.split())

UniqueWords = []
for word in Words:
    if word.lower() not in UniqueWords:
        UniqueWords.append(word.lower())

print Words
UniqueWords.sort()
print UniqueWords

This always checks against the lowercase version of the word, to ensure the same word but in a different case configuration is not counted as 2 different words.

I added checks to remove the double quotes at the start and end of the file, but if they are not present in the actual file. These lines could be disregarded.

Below function might help.

   def remove_duplicate_from_list(temp_list):
        if temp_list:
            my_list_temp = []
            for word in temp_list:
                if word not in my_list_temp:
                    my_list_temp.append(word)
            return my_list_temp
        else: return []
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!