How to count differences between two files on linux?

房东的猫 提交于 2019-12-18 10:26:23

问题


I need to work with large files and must find differences between two. And I don't need the different bits, but the number of differences.

To find the number of different rows I come up with

diff --suppress-common-lines --speed-large-files -y File1 File2 | wc -l

And it works, but is there a better way to do it?

And how to count the exact number of differences (with standard tools like bash, diff, awk, sed some old version of perl)?


回答1:


diff -U 0 file1 file2 | grep -v ^@ | wc -l

That minus 2 for the two file names at the top of the diff listing. Unified format is probably a bit faster than side-by-side format.




回答2:


If you want to count the number of lines that are different use this:

diff -U 0 file1 file2 | grep ^@ | wc -l

Doesn't John's answer double count the different lines?




回答3:


If using Linux/Unix, what about comm -1 file1 file2 to print lines in file1 that aren't in file2, comm -1 file1 file2 | wc -l to count them, and similarly for comm -2 ...?




回答4:


Since every output line that differs starts with < or > character, I would suggest this:

diff file1 file2 | grep ^[\>\<] | wc -l

By using only \< or \> in the script line you can count differences only in one of the files.




回答5:


I believe the correct solution is in this answer, that is:

$ diff -y --suppress-common-lines a b | grep '^' | wc -l
1



回答6:


If you're dealing with files with analogous content that should be sorted the same line-for-line (like CSV files describing similar things) and you would e.g. want to find 2 differences in the following files:

File a:    File b:
min,max    min,max
1,5        2,5
3,4        3,4
-2,10      -1,1

you could implement it in Python like this:

different_lines = 0
with open(file1) as a, open(file2) as b:
    for line in a:
        other_line = b.readline()
        if line != other_line:
            different_lines += 1


来源:https://stackoverflow.com/questions/1566461/how-to-count-differences-between-two-files-on-linux

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!