问题
I want to split a text file like the one pasted below (sorry for the length), on every n occurence of ">". For example, every 2nd occurrence of ">", but I need to able able to change that number.
test_split.txt:
>eeefkdfn
a
a
a
>c 4ufjdhf
b
b
b
b
>
c
c
> c
d
d
d
d
d
>3
>cr
>c3
e
e
e
e
e
> 5
f
f
f
f
>cr
g
g
g
g
> cr dkjfddf
h
h
h
h
So I want to have output files this these (only showing the first two):
file_1.txt:
>eeefkdfn
a
a
a
>c 4ufjdhf
b
b
b
b
file_2.txt:
>
c
c
> c
d
d
d
d
d
etc.
Question:
I have been trying to achieve that result using this awk command:
awk '/^>/ {n++} { file = sprintf("file_%s.txt", int(n/2)); print >> file; }' < test_split.txt
And instead of the desired result, I am getting correct output (split) files, except for the first one, which only contains one occurence of ">" (instead of two), like this:
cat test_0.txt
>eeefkdfn
a
a
a
cat test_1.txt
>chr1 4ufjdhf
b
b
b
b
>
c
c
Any idea why that is? Thank you!
回答1:
This seems more simple:
awk 'BEGIN{i=1}/^>/{cont++}cont==3{i++;cont=1}{print > "file_"i".txt"} file
Will gives you the expected result:
$ cat file_1.txt
>eeefkdfn
a
a
a
>c 4ufjdhf
b
b
b
b
$ cat file_2.txt
>
c
c
> c
d
d
d
d
d
Explanation
BEGIN{i=1}: File counter initialization.
/^>/{cont++}: To count every > found.
cont==3{i++;cont=1}: To increase the file counter and initialize the cont var every third appearance of the > char which becomes first again.
{print > "file_"i".txt"}: Direct the output to the expected file.
回答2:
You can use this awk for dynamic control over number n where file will be split on nth occurrence of > in input data:
awk -v n=2 'function ofile() {
if (op)
close(op)
op = sprintf("file_%d.txt", ++p)
}
BEGIN {
ofile()
}
/>/ {
++i
}
i > n {
i=1
ofile()
}
{
print $0 > op
}
END {
close(op)
}' file
Here is one liner in case you want to copy/paste:
awk -v n=2 'function ofile() {if (op) close(op); op = sprintf("file_%d.txt", ++p)} BEGIN{ofile()} />/{++i} i>n{i=1; ofile()} { print $0 > op }' file
来源:https://stackoverflow.com/questions/42301745/awk-splitting-file-on-nth-occurence-of-delimiter-wrong-first-split-file