How to return unique words from the text file using Python

问题

How do I return all the unique words from a text file using Python? For example:

I am not a robot

I am a human

Should return:

I

am

not

a

robot

human

Here is what I've done so far:

def unique_file(input_filename, output_filename):
    input_file = open(input_filename, 'r')
    file_contents = input_file.read()
    input_file.close()
    word_list = file_contents.split()

    file = open(output_filename, 'w')

    for word in word_list:
        if word not in word_list:
            file.write(str(word) + "\n")
    file.close()

The text file the Python creates has nothing in it. I'm not sure what I am doing wrong

回答1:

for word in word_list:
    if word not in word_list:

every word is in word_list, by definition from the first line.

Instead of that logic, use a set:

unique_words = set(word_list)
for word in unique_words:
    file.write(str(word) + "\n")

sets only hold unique members, which is exactly what you're trying to achieve.

Note that order won't be preserved, but you didn't specify if that's a requirement.

回答2:

Simply iterate over the lines in the file and use set to keep only the unique ones.

from itertools import chain

def unique_words(lines):
    return set(chain(*(line.split() for line in lines if line)))

Then simply do the following to read all unique lines from a file and print them

with open(filename, 'r') as f:
    print(unique_words(f))

回答3:

def unique_file(input_filename, output_filename):
    input_file = open(input_filename, 'r')
    file_contents = input_file.read()
    input_file.close()
    duplicates = []
    word_list = file_contents.split()
    file = open(output_filename, 'w')
    for word in word_list:
        if word not in duplicates:
            duplicates.append(word)
            file.write(str(word) + "\n")
    file.close()

This code loops over every word, and if it is not in a list duplicates, it appends the word and writes it to a file.

回答4:

This seems to be a typical application for a collection:

...
import collections
d = collections.OrderedDict()
for word in wordlist: d[word] = None 
# use this if you also want to count the words:
# for word in wordlist: d[word] = d.get(word, 0) + 1 
for k in d.keys(): print k

You could also use a collection.Counter(), which would also count the elements you feed in. The order of the words would get lost though. I added a line for counting and keeping the order.

回答5:

Using Regex and Set:

import re
words = re.findall('\w+', text.lower())
uniq_words = set(words)

Other way is creating a Dict and inserting the words like keys:

for i in range(len(doc)):
        frase = doc[i].split(" ")
        for palavra in frase:
            if palavra not in dict_word:
                dict_word[palavra] = 1
print dict_word.keys()

回答6:

string = "I am not a robot\n I am a human"
list_str = string.split()
print list(set(list_str))

回答7:

The problem with your code is word_list already has all possible words of the input file. When iterating over the loop you are basically checking if a word in word_list is not present in itself. So it'll always be false. This should work.. (Note that this wll also preserve the order).

def unique_file(input_filename, output_filename):
  z = []
  with open(input_filename,'r') as fileIn, open(output_filename,'w') as fileOut:
      for line in fileIn:
          for word in line.split():
              if word not in z:
                 z.append(word)
                 fileOut.write(word+'\n')

回答8:

Use a set. You don't need to import anything to do this.

#Open the file
my_File = open(file_Name, 'r')
#Read the file
read_File = my_File.read()
#Split the words
words = read_File.split()
#Using a set will only save the unique words
unique_words = set(words)
#You can then print the set as a whole or loop through the set etc
for word in unique_words:
     print(word)

回答9:

try:
    with open("gridlex.txt",mode="r",encoding="utf-8")as india:

        for data in india:
            if chr(data)==chr(data):
                print("no of chrats",len(chr(data)))
            else:
                print("data")
except IOError:
    print("sorry")

来源：https://stackoverflow.com/questions/22978602/how-to-return-unique-words-from-the-text-file-using-python

标签

python

text-files

unique