Logic for a file compare

会有一股神秘感。 提交于 2019-12-12 11:35:03

问题


I trying to write a programm for file compare. For example:

file1

1
2
3
4
5

file2

1
2
@
3
4
5

If I do it line by line, I get:

1 == 1; 
2 == 2;
3 != @;
4 != 3;
5 != 4;
  != 5;

But, the truth is that the only difference between the files is @. I want get something like this:

1 == 1;
2 == 2;
  != @;
3 == 3;
4 == 4;
5 == 5;

Which is the best way to do it? without using any external application, such as diff, fc, etc.


回答1:


Python has a very handy library for comparing sequences called difflib. The underlying SequenceMatcher class takes two python sequences and gives you (among other things) a sequence of opcodes telling you how you would get from the first sequence to the second (i.e. the differences). These are of the form:

  • Replace this block with that one
  • Insert a block
  • Delete a block
  • Copy a block (called 'equal')

These reference blocks by giving indices into the original sequences. This can be applied to lines in a file or characters in a string or anything else you can turn into a sequence in python.




回答2:


I wonder if Levenshtein Distance would help you in this situation. It would give you how similar the two files are but I don't know if you could zero in on the @. Something to look at none the less.




回答3:


I believe what you're looking for is the distance between 2 strings, maybe this can help you.




回答4:


If you are not writing the program to learn something about diff algorithms but are simply looking for a solution, you should try diff-match-patch. It contains implementations of diff and patch algorithms in different programming languages (cpp, c#, java, javascript, python).

I tried its java version and it worked like a charm.




回答5:


A bit out of date, I suppose :) but I came across this post because I was looking for help on the same problem: I have two files which I display side by side, and I have to mark the lines that don't match in red.

Mine is a little bit of a special case, though, because 1) order is not important, and 2) each line is guaranteed to occur only once (the text is a license file with definitions, line by line).

It turned out that the easiest way of doing it was just to make lists of the two files, ls1 and ls2, and do the following (in pseudocode):

i = 0;
while (i < ls1.count) {
    n = ls2.find(ls1[i]);
    if (n >= 0) {
        // found match in ls2
        ls1.Delete(i);
        ls2.Delete(n);
    } else
        i++;
}

Explained, for each line is ls1, see if there is a corresponding line in ls2. If so, delete both. What you're left with is simply the differences, and you can easily mark up those lines in the original text.

Extremely easy, no libraries included. Just my two cents...



来源:https://stackoverflow.com/questions/1721938/logic-for-a-file-compare

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!