Extracting column range from text file via bash tool

本秂侑毒 提交于 2019-12-11 05:25:55

问题


Assume a text file (file1) that contains multiple lines of alphabetic strings, each preceded by a short alphanumeric string that acts as a barcode. The alphabetic strings are all identic in length, the preceding alphanumeric ones are not. Alphabetic and alphanumeric strings are separated by a whitespace in each line.

$ cat file1
a1 abcdefghijklmnopqrstuvwxyz
b27 abcdefghijklmnopqrstuvwxyz
c4 abcdefghijklmnopqrstuvwxyz

Assume a second file (file2) that contains information on a column range. This range is always smaller than the alphabetic string.

$ cat file2
2-13

I am trying to develop bash code that extracts the column range specified in file2 from the alphabetic strings in file1, while maintaining the barcodes.

$ sought_command file1 file2
a1 bcdefghijklm
b27 bcdefghijklm
c4 bcdefghijklm

I am uncertain which bash power tool would be helpful in this regard, but presume that awk will be the tool that could do this.

Note: I am aware that code in Python may be easiest to write regarding this task, which I did. However, I found my Python implementation to be unreasonably slow, as the alphabetic strings to be processed are tens of thousands of characters long. Thus, I am deliberately trying to solve this issue with a bash tool.


回答1:


$ awk 'NR==FNR{start=$1;lgth=$2;next} {print $1, substr($2,start,lgth)}' FS='-' file2 FS=' ' file1
a1 bcdefghijklmn
b27 bcdefghijklmn
c4 bcdefghijklmn

or if the 2nd field is the end position rather than the length:

$ awk 'NR==FNR{start=$1;lgth=$2-$1+1;next} {print $1, substr($2,start,lgth)}' FS='-' file2 FS=' ' file1
a1 bcdefghijklm
b27 bcdefghijklm
c4 bcdefghijklm


来源:https://stackoverflow.com/questions/43944342/extracting-column-range-from-text-file-via-bash-tool

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!