find duplicate words in two files [closed]

末鹿安然 提交于 2019-12-11 11:54:57

问题


I've two text files. I need to check for duplicate words inside them. Is there a way more concise than this code?

file1 = set(line.strip() for line in open('/home/user1/file1.txt'))
file2 = set(line.strip() for line in open('/home/user1/file2.txt'))

for line in file1 & file2:
    if line:
        print(line)

回答1:


You can write concise code but more importantly you don't need to create two sets, you can use set.intersection which will allow your code to work for larger data sets and run faster:

with open('/home/user1/file1.txt') as f1,  open('/home/user1/file2.txt') as f2:
    for line in set(map(str.rstrip,f2)).intersection(map(str.rstrip,f2))):
        print(line)

For python2 use itertools.imap:

from itertools import imap
with open('/home/user1/file1.txt') as f1,  open('/home/user1/file2.txt') as f2:
    for line in set(imap(str.rstrip,f2)).intersection(imap(str.rstrip(f2))):
        print(line)

You create a single set which is then added to iterating over the iterable passed in i.e the str.rstripped lines of file2 as oopposed to creating two full sets of lines first then doing the intersection.




回答2:


Even shorter:

with open('/home/user/file1.txt') as file1, open('/home/user/file2.txt') as file2:
    print "".join([word+"\n" for word in set(file1.read().split()) & set(file2.read().split())])



回答3:


This is one line shorter and closes both files after use:

with open('/home/user1/file1.txt') as file1, open('/home/user1/file2.txt') as file2:
    for line in set(line.strip() for line in file1) & set(line.strip() for line in file2):
        if line: 
            print(line)

Variation with only one set:

with open('/home/user1/file1.txt') as file1, open('/home/user1/file2.txt') as file2:
    for line in set(line.strip() for line in file1).intersection(line.strip() for line in 
                                                                 file2):
        if line: 
            print(line)


来源:https://stackoverflow.com/questions/34588974/find-duplicate-words-in-two-files

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!