Compare two text files to find differences and output them to a new text file

こ雲淡風輕ζ 提交于 2019-12-07 23:00:11

问题


I am trying to work on a simple data comparison text document. The goal is for the user to be able to select a file, search through this file for a certain parameter, then print those parameters into a new text document, after compare those parameters from the new text document with a text document that has the default parameters and then once they've been compared to print out the differences into a new text document.

I've created a simple flowchart to summarize this:

This is my current code. I am using the diff lib to compare the two files.

import difflib
from Tkinter import *
import tkSimpleDialog
import tkMessageBox
from tkFileDialog import askopenfilename

root = Tk()
w = Label(root, text ="Configuration Inspector")
w.pack()
tkMessageBox.showinfo("Welcome", "This is version 1.00 of Configuration Inspector")
filename = askopenfilename() # Logs File
filename2 = askopenfilename() # Default Configuration
compareFile = askopenfilename() # Comparison File
outputfilename = askopenfilename() # Out Serial Number Configuration from Logs

with open(filename, "rb") as f_input:
    start_token = tkSimpleDialog.askstring("Serial Number", "What is the serial number?")
    end_token = tkSimpleDialog.askstring("End Keyword", "What is the end keyword")
    reText = re.search("%s(.*?)%s" % (re.escape(start_token + ",SHOWALL"), re.escape(end_token)), f_input.read(), re.S)
    if reText:
        output = reText.group(1)
        fo = open(outputfilename, "wb")
        fo.write(output)
        fo.close()

        diff = difflib.ndiff(outputfilename, compareFile)
        print '\n'.join(list(diff))

    else:
        tkMessageBox.showinfo("Output", "Sorry that input was not found in the file")
        print "not found"

The result so far is that the program correctly searches through the file you select for it to search through, Then prints out the parameters it finds into a new Output Text file.

The issues arises when trying to compare the two files, the Default Data and the Output File.

When comparing the program will output the differences however, Since the Default Data File has different lines than the Output file it will only print out the lines that do not match rather than the Parameters that do not match. In other words lets say I have these two files:

Default Data Text File:

Data1 = 1
Data2 = 2
Data3 = 3
Data4 = 4
Data5 = 5
Data6 = 6

Output Data Text File:

Data1 = 1
Data2 = 2
Data3 = 8
Data4 = 7

So since Data3 and Data4 do Not Match the difference.txt file (The Comparison Output) should show that. For Example:

Data3 = 8
Data4 = 7
Data5 = 5
Data6 = 6

However it does not match or compare the lines, it just checks to see if there's a line in that space. So currently my Comparison output looks like this:

Data5 = 5
Data6 = 6

Any ideas on how I can make the comparison show everything that is difference between the file's parameters?

If you need any more details please let me know in the comments I will edit the original post to add more details.


回答1:


I don't know what you're trying to do with difflib.ndiff(). That function takes two lists of strings, but you are passing it filenames.

Anyway, here's a short demo that performs the comparison that you want. It uses a dict to speed up the comparison process. Obviously, I don't have your data files, so this program creates lists of strings using the string .splitlines() method.

It goes through the default data list line by line.
If that data is not present in the output dict, then the default line is printed.
If a data key with that value is present in the output dict, then that line is skipped.
If the key is found but the value in the output dict is different to the default value, then a line with the key & output value is printed.

#Build default data list
defdata = '''
Data1 = 1
Data2 = 2
Data3 = 3
Data4 = 4
Data5 = 5
Data6 = 6
'''.splitlines()[1:]

#Build output data list
outdata = '''
Data1 = 1
Data2 = 2
Data3 = 8
Data4 = 7
'''.splitlines()[1:]

outdict = dict(line.split(' = ') for line in outdata)

for line in defdata:
    key, val = line.split(' = ')
    if key in outdict:
        outval = outdict[key]
        if outval != val:
            print '%s = %s' % (key, outval)
    else:
        print line

output

Data3 = 8
Data4 = 7
Data5 = 5
Data6 = 6

Here's how to read a text file into a list of lines.

with open(filename) as f:
    data = f.read().splitlines()

There's also a .readlines() method, but it's not so useful here because it preserves the \n newline character at the end of each line, and we don't want that.

Note that if there are any blank lines in the text file then the resulting list will have an empty string '' in that position. Also, that code won't remove any leading or trailing blanks or other whitespace on each line. But if you need to do that there are thousands of examples that can show you how here on Stack Overflow.


Version 2

This new version uses a slightly different approach. It loops over a sorted list of all the keys found in either the default list or the output list.
If a key is only found in one of the lists the corresponding line is added to the diff list.
If a key is found in both lists but the output line differs from the default line then the corresponding line from the output list is added to the diff list. If both lines are identical, nothing is added to the diff list.

#Build default data list
defdata = '''
Data1 = 1
Data2 = 2
Data3 = 3
Data4 = 4
Data5 = 5
Data6 = 6
'''.splitlines()[1:]

#Build output data list
outdata = '''
Data1 = 1
Data2 = 2
Data3 = 8
Data4 = 7
Data8 = 8
'''.splitlines()[1:]

def make_dict(data):
    return dict((line.split(None, 1)[0], line) for line in data)

defdict = make_dict(defdata)
outdict = make_dict(outdata)

#Create a sorted list containing all the keys
allkeys = sorted(set(defdict) | set(outdict))
#print allkeys

difflines = []
for key in allkeys:
    indef = key in defdict
    inout = key in outdict
    if indef and not inout:
        difflines.append(defdict[key])
    elif inout and not indef:
        difflines.append(outdict[key])
    else:
        #key must be in both dicts
        defval = defdict[key]
        outval = outdict[key]
        if outval != defval:
            difflines.append(outval)

for line in difflines:
    print line

output

Data3 = 8
Data4 = 7
Data5 = 5
Data6 = 6
Data8 = 8


来源:https://stackoverflow.com/questions/32074743/compare-two-text-files-to-find-differences-and-output-them-to-a-new-text-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!