Comparing two files in linux terminal

后端未结

关注

 10  1190

一生所求 2020-12-22 17:31

There are two files called \"a.txt\" and \"b.txt\" both have a list of words. Now I want to check which words are extra in \"a.txt\

10条回答

我在风中等你 (楼主)

2020-12-22 17:43

Using awk for it. Test files:

$ cat a.txt one two three four four $ cat b.txt three two one

The awk:

$ awk ' NR==FNR { # process b.txt or the first file seen[$0] # hash words to hash seen next # next word in b.txt } # process a.txt or all files after the first !($0 in seen)' b.txt a.txt # if word is not hashed to seen, output it

Duplicates are outputed:

four four

To avoid duplicates, add each newly met word in a.txt to seen hash:

$ awk ' NR==FNR { seen[$0] next } !($0 in seen) { # if word is not hashed to seen seen[$0] # hash unseen a.txt words to seen to avoid duplicates print # and output it }' b.txt a.txt

Output:

four

If the word lists are comma-separated, like:

$ cat a.txt four,four,three,three,two,one five,six $ cat b.txt one,two,three

you have to do a couple of extra laps (forloops):

awk -F, ' # comma-separated input NR==FNR { for(i=1;i<=NF;i++) # loop all comma-separated fields seen[$i] next } { for(i=1;i<=NF;i++) if(!($i in seen)) { seen[$i] # this time we buffer output (below): buffer=buffer (buffer==""?"":",") $i } if(buffer!="") { # output unempty buffers after each record in a.txt print buffer buffer="" } }' b.txt a.txt

Output this time:

four five,six

0 讨论(0)

查看其它10个回答

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复