Find, replace, and increment at each occurence of string

北城余情 提交于 2019-12-06 11:59:04

问题


I'm relatively new to scripting and apologize in advance for this painfully simple problem. I believe I've searched pretty thoroughly, but apparently no other answers or cookbooks have been explicit enough for me to understand (like here - still couldn't get it).

I have a file that is made up of strings of letters (DNA, if you care), one string per line. Above each string I've inserted another line to identify the underlying string. For those of you who are bioinformaticians, I'm trying to make up a test data set in fasta format, maybe you have tools? Anyway, I'd put a distinct word, "num", after each ">" with the intention of using a bash incrementer and sed to create a unique number heading each string. For example, in data.txt, I have...

>num, blah, blah, blah

ATCGACTGAATCGA

>num, blah, blah, blah

ATCGATCGATCGATCG

>num, blah, blah, blah

ATCGATCGATCGATCG

I would like it to be...

>0, blah, blah, blah

ATCGACTGAATCGA

>1, blah, blah, blah

ATCGATCGATCGATCG

>2, blah, blah, blah

ATCGATCGATCGATCG

The solution can be in any language as long as it's complete && gets the job done. I have a little experience with sed, awk, bash, and c++ (little == slightly more than no experience). I know, I know, I need to learn perl, but I've only just started. The question is this: How to replace "num" with a number that increments on each replacement? It doesn't matter if the underlying string is identical to another somewhere else. Thanks for your help in advance!


回答1:


perl -ple 's/num/$n++/e' filename

dry run 1st, if it is do that, what you want




回答2:


This uses process substitution, which may or may not be available on your system.

jcomeau@intrepid:/tmp$ exec 3< <(cat test.txt)
jcomeau@intrepid:/tmp$ i=0
jcomeau@intrepid:/tmp$ while read -u 3 first_word the_rest; do
 if [ "$first_word" == ">num," ]; then
 echo ">$i," $the_rest; i=$((i + 1)); else
 echo $first_word $the_rest; fi; done
>0, blah, blah, blah

ATCGACTGAATCGA

>1, blah, blah, blah

ATCGATCGATCGATCG

>2, blah, blah, blah

ATCGATCGATCGATCG


来源:https://stackoverflow.com/questions/6313150/find-replace-and-increment-at-each-occurence-of-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!