From the unix terminal, we can use diff file1 file2 to find the difference between two files. Is there a similar command to show the similarity across 2 files? (many pipes allowed if necessary.
Each file contains a line with a string sentence; they are sorted and duplicate lines removed with sort file1 | uniq.
file1: http://pastebin.com/taRcegVn
file2: http://pastebin.com/2fXeMrHQ
And the output should output the lines that appears in both files.
output: http://pastebin.com/FnjXFshs
I am able to use python to do it as such but i think it's a little too much to put into the terminal:
x = set([i.strip() for i in open('wn-rb.dic')])
y = set([i.strip() for i in open('wn-s.dic')])
z = x.intersection(y)
outfile = open('reverse-diff.out')
for i in z:
print>>outfile, i
As @tjameson mentioned it may be solved in another thread.
Just would like to post another solution:
sort file1 file2 | awk 'dup[$0]++ == 1'
refer to awk guide to get some awk basics, when the pattern value of a line is true this line will be printed
dup[$0] is a hash table in which each key is each line of the input, the original value is 0 and increments once this line occurs, when it occurs again the value should be 1, so
dup[$0]++ == 1is true. Then this line is printed.
Note that this only works when there are not duplicates in either file, as was specified in the question.
If you want to get a list of repeated lines without resorting to AWK, you can use -d flag to uniq:
sort file1 file2 | uniq -d
来源:https://stackoverflow.com/questions/15470260/how-to-find-duplicate-lines-across-2-different-files-unix