问题
I want to be able to extract two different sequences from one line.
For example:
atg ttg tca aat tca tgg atc atg ttg tca aat tca tgg atc tag
I want to create a loop where the program will read from the 1st atg to tag, output that sequence into a file, as well as take the second atg read to tag, output that sequence into the same file.
Output I want:
atg ttg tca aat tca tgg atc atg ttg tca aat tca tgg atc tag
atg ttg tca aat tca tgg atc tag
How can I go about this?
Thank you for the help.
回答1:
Would you please try the following:
str="atg ttg tca aat tca tgg atc atg ttg tca aat tca tgg atc tag"
start="atg" # start marker of the sequence
end="tag" # end marker of the sequence
read -r -a ary <<< "$str"
for (( i=0; i<${#ary[@]}; i++ )); do
if [[ ${ary[$i]} = $start ]]; then
index_s+=("$i")
elif [[ ${ary[$i]} = $end ]]; then
index_e+=("$i")
fi
done
s=${index_s[0]}; n=$(( ${index_e[0]} - ${index_s[0]} + 1 ))
echo "${ary[@]:$s:$n}" > "result.txt"
s=${index_s[1]}; n=$(( ${index_e[0]} - ${index_s[1]} + 1 ))
echo "${ary[@]:$s:$n}" >> "result.txt"
Result:
atg ttg tca aat tca tgg atc atg ttg tca aat tca tgg atc tag
atg ttg tca aat tca tgg atc tag
[How it works]
read -r -a ary <<< "$str"splits$stron whitespaces (IFS) and stores the elements into an arrayary.- Then the
forloop iterates over the array elements for the start/end markers. - If the start marker
atgis found, the position is stored in an arrayindex_s. Finally${index_s[0]}holds the first position of the start marker and${index_s[1]}holds the second one (and so on). The same operation is performed with the end markertag. - Eventually the script outputs two sets of array slice. One starts with
the first
atgand ends with the firsttag. The other starts with the secondatgand ends with the firsttag.
Hope this helps.
回答2:
When you want at most 2 sequences, you can grep inside the original and a modified string:
s='atg ttg tca aat tca tgg atc atg ttg tca aat tca tgg atc tag'
printf "%s\n" "$s" "${s#*atg}" | grep -Eo "atg.*tag"
When you want to extract more than 2 substrings when available, you need a loop.
s='atg ttg tca aat tca tgg atc atg ttg tca aat tca tgg atc tag'
while [ "$s" ]; do
s=$(grep -Eo "atg.*tag" <<< "$s")
if [ "$s" ]; then
echo "$s"
s="${s#atg}"
fi
done
来源:https://stackoverflow.com/questions/58295188/i-want-to-be-able-to-extract-two-different-sequences-from-one-line