Compare two text files to find differences and output them to a new text file

折月煮酒 提交于 2019-12-06 09:22:28

I don't know what you're trying to do with difflib.ndiff(). That function takes two lists of strings, but you are passing it filenames.

Anyway, here's a short demo that performs the comparison that you want. It uses a dict to speed up the comparison process. Obviously, I don't have your data files, so this program creates lists of strings using the string .splitlines() method.

It goes through the default data list line by line.
If that data is not present in the output dict, then the default line is printed.
If a data key with that value is present in the output dict, then that line is skipped.
If the key is found but the value in the output dict is different to the default value, then a line with the key & output value is printed.

#Build default data list
defdata = '''
Data1 = 1
Data2 = 2
Data3 = 3
Data4 = 4
Data5 = 5
Data6 = 6
'''.splitlines()[1:]

#Build output data list
outdata = '''
Data1 = 1
Data2 = 2
Data3 = 8
Data4 = 7
'''.splitlines()[1:]

outdict = dict(line.split(' = ') for line in outdata)

for line in defdata:
    key, val = line.split(' = ')
    if key in outdict:
        outval = outdict[key]
        if outval != val:
            print '%s = %s' % (key, outval)
    else:
        print line

output

Data3 = 8
Data4 = 7
Data5 = 5
Data6 = 6

Here's how to read a text file into a list of lines.

with open(filename) as f:
    data = f.read().splitlines()

There's also a .readlines() method, but it's not so useful here because it preserves the \n newline character at the end of each line, and we don't want that.

Note that if there are any blank lines in the text file then the resulting list will have an empty string '' in that position. Also, that code won't remove any leading or trailing blanks or other whitespace on each line. But if you need to do that there are thousands of examples that can show you how here on Stack Overflow.


Version 2

This new version uses a slightly different approach. It loops over a sorted list of all the keys found in either the default list or the output list.
If a key is only found in one of the lists the corresponding line is added to the diff list.
If a key is found in both lists but the output line differs from the default line then the corresponding line from the output list is added to the diff list. If both lines are identical, nothing is added to the diff list.

#Build default data list
defdata = '''
Data1 = 1
Data2 = 2
Data3 = 3
Data4 = 4
Data5 = 5
Data6 = 6
'''.splitlines()[1:]

#Build output data list
outdata = '''
Data1 = 1
Data2 = 2
Data3 = 8
Data4 = 7
Data8 = 8
'''.splitlines()[1:]

def make_dict(data):
    return dict((line.split(None, 1)[0], line) for line in data)

defdict = make_dict(defdata)
outdict = make_dict(outdata)

#Create a sorted list containing all the keys
allkeys = sorted(set(defdict) | set(outdict))
#print allkeys

difflines = []
for key in allkeys:
    indef = key in defdict
    inout = key in outdict
    if indef and not inout:
        difflines.append(defdict[key])
    elif inout and not indef:
        difflines.append(outdict[key])
    else:
        #key must be in both dicts
        defval = defdict[key]
        outval = outdict[key]
        if outval != defval:
            difflines.append(outval)

for line in difflines:
    print line

output

Data3 = 8
Data4 = 7
Data5 = 5
Data6 = 6
Data8 = 8
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!