问题
Assume a text file (file1
) that contains multiple lines of alphabetic strings, each preceded by a short alphanumeric string that acts as a barcode. The alphabetic strings are all identic in length, the preceding alphanumeric ones are not. Alphabetic and alphanumeric strings are separated by a whitespace in each line.
$ cat file1
a1 abcdefghijklmnopqrstuvwxyz
b27 abcdefghijklmnopqrstuvwxyz
c4 abcdefghijklmnopqrstuvwxyz
Assume a second file (file2
) that contains information on a column range. This range is always smaller than the alphabetic string.
$ cat file2
2-13
I am trying to develop bash code that extracts the column range specified in file2
from the alphabetic strings in file1
, while maintaining the barcodes.
$ sought_command file1 file2
a1 bcdefghijklm
b27 bcdefghijklm
c4 bcdefghijklm
I am uncertain which bash power tool would be helpful in this regard, but presume that awk
will be the tool that could do this.
Note: I am aware that code in Python may be easiest to write regarding this task, which I did. However, I found my Python implementation to be unreasonably slow, as the alphabetic strings to be processed are tens of thousands of characters long. Thus, I am deliberately trying to solve this issue with a bash tool.
回答1:
$ awk 'NR==FNR{start=$1;lgth=$2;next} {print $1, substr($2,start,lgth)}' FS='-' file2 FS=' ' file1
a1 bcdefghijklmn
b27 bcdefghijklmn
c4 bcdefghijklmn
or if the 2nd field is the end position rather than the length:
$ awk 'NR==FNR{start=$1;lgth=$2-$1+1;next} {print $1, substr($2,start,lgth)}' FS='-' file2 FS=' ' file1
a1 bcdefghijklm
b27 bcdefghijklm
c4 bcdefghijklm
来源:https://stackoverflow.com/questions/43944342/extracting-column-range-from-text-file-via-bash-tool