Extracting column range from text file via bash tool

问题

Assume a text file (file1) that contains multiple lines of alphabetic strings, each preceded by a short alphanumeric string that acts as a barcode. The alphabetic strings are all identic in length, the preceding alphanumeric ones are not. Alphabetic and alphanumeric strings are separated by a whitespace in each line.

$ cat file1
a1 abcdefghijklmnopqrstuvwxyz
b27 abcdefghijklmnopqrstuvwxyz
c4 abcdefghijklmnopqrstuvwxyz

Assume a second file (file2) that contains information on a column range. This range is always smaller than the alphabetic string.

$ cat file2
2-13

I am trying to develop bash code that extracts the column range specified in file2 from the alphabetic strings in file1, while maintaining the barcodes.

$ sought_command file1 file2
a1 bcdefghijklm
b27 bcdefghijklm
c4 bcdefghijklm

I am uncertain which bash power tool would be helpful in this regard, but presume that awk will be the tool that could do this.

Note: I am aware that code in Python may be easiest to write regarding this task, which I did. However, I found my Python implementation to be unreasonably slow, as the alphabetic strings to be processed are tens of thousands of characters long. Thus, I am deliberately trying to solve this issue with a bash tool.

回答1:

$ awk 'NR==FNR{start=$1;lgth=$2;next} {print $1, substr($2,start,lgth)}' FS='-' file2 FS=' ' file1
a1 bcdefghijklmn
b27 bcdefghijklmn
c4 bcdefghijklmn

or if the 2nd field is the end position rather than the length:

$ awk 'NR==FNR{start=$1;lgth=$2-$1+1;next} {print $1, substr($2,start,lgth)}' FS='-' file2 FS=' ' file1
a1 bcdefghijklm
b27 bcdefghijklm
c4 bcdefghijklm

来源：https://stackoverflow.com/questions/43944342/extracting-column-range-from-text-file-via-bash-tool

标签

string

bash

awk

split

multiple-columns