get common lines from two text files [closed]

给你一囗甜甜゛ 提交于 2019-12-11 01:34:38

问题


I've got two files.

The first (file1) is like so: (there is always a headerline before a 'text'line)

>random header name1
wonderfulstringwhatsoevergoeson
>random header 2
someotherline
...

The other file (file2) is a modifed file of file1 like: (the header have been removed and the lines are shuffled, a new header has been added)

>name
someotherline
wonderfulstringwhatsoevergoeson

Each line (without the header) of file1 occurs in file2. The order of lines in file2 differs from file1. Both files should stay in the order they are.

Each line in file2

Output should be something alike: (header of file2 can be ignored)

>random header 2
>random header name1

Has anybody a clue, how to do so?

Best regards


回答1:


Code for GNU sed:

$sed '/^[>]/N;s#\(.*\)\n\(.*\)#/\2/s/.*/\1/p#' file1|sed -nf - file2
>random header 2
>random header name1



回答2:


Given the clarification that the files should stay the same, just use:

sort file1 file2 file2 | uniq -u

and you're done.

Alternatively, if the files are big, so that sorting of (file1+file2+file2) is too much of a burden, you can use this:

comm -23 <( sort file1 ) <( sort file2 )

Which will just sort each file (the file on disk will be kept the way it is, it will not be modified), and then print lines which exist in file1, but not in file2.

Example:

=$ cat file1 
some header
abc
cdf
efg
other header

=$ cat file2 
file2 header
cdf
file2 header part2
efg
abc

=$ comm -23 <( sort file1 ) <( sort file2 )
other header
some header



回答3:


If I understand you correctly, you want to print the respective header from file1 corresponding to each element of file2.

#!/bin/bash

cat file2 | 
while read line; do 
    grep -B 1 "$line" file1 | head -n1
done

grep -B 1 will print one line before match. We can cut the first line by head.
This might be called a hack. (But I'm still a beginner).

file1:

>random header name1
wonderfulstringwhatsoevergoeson
>random header 2
someotherline

file2:

someotherline
wonderfulstringwhatsoevergoeson

Output:

>random header 2
>random header name1

Also understand this solution as pointed out by depesz is slow.



来源:https://stackoverflow.com/questions/17361450/get-common-lines-from-two-text-files

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!