问题
I've got two files.
The first (file1) is like so:
(there is always a headerline before a 'text'line)
>random header name1
wonderfulstringwhatsoevergoeson
>random header 2
someotherline
...
The other file (file2) is a modifed file of file1 like:
(the header have been removed and the lines are shuffled, a new header has been added)
>name
someotherline
wonderfulstringwhatsoevergoeson
Each line (without the header) of file1 occurs in file2.
The order of lines in file2 differs from file1.
Both files should stay in the order they are.
Each line in file2
Output should be something alike:
(header of file2 can be ignored)
>random header 2
>random header name1
Has anybody a clue, how to do so?
Best regards
回答1:
Code for GNU sed:
$sed '/^[>]/N;s#\(.*\)\n\(.*\)#/\2/s/.*/\1/p#' file1|sed -nf - file2 >random header 2 >random header name1
回答2:
Given the clarification that the files should stay the same, just use:
sort file1 file2 file2 | uniq -u
and you're done.
Alternatively, if the files are big, so that sorting of (file1+file2+file2) is too much of a burden, you can use this:
comm -23 <( sort file1 ) <( sort file2 )
Which will just sort each file (the file on disk will be kept the way it is, it will not be modified), and then print lines which exist in file1, but not in file2.
Example:
=$ cat file1
some header
abc
cdf
efg
other header
=$ cat file2
file2 header
cdf
file2 header part2
efg
abc
=$ comm -23 <( sort file1 ) <( sort file2 )
other header
some header
回答3:
If I understand you correctly, you want to print the respective header from file1 corresponding to each element of file2.
#!/bin/bash
cat file2 |
while read line; do
grep -B 1 "$line" file1 | head -n1
done
grep -B 1 will print one line before match. We can cut the first line by head.
This might be called a hack. (But I'm still a beginner).
file1:
>random header name1
wonderfulstringwhatsoevergoeson
>random header 2
someotherline
file2:
someotherline
wonderfulstringwhatsoevergoeson
Output:
>random header 2
>random header name1
Also understand this solution as pointed out by depesz is slow.
来源:https://stackoverflow.com/questions/17361450/get-common-lines-from-two-text-files