问题
I want to be able to extract two different sequences from one line.
For example:
atg ttg tca aat tca tgg atc atg ttg tca aat tca tgg atc tag
I want to create a loop where the program will read from the 1st atg to tag, output that sequence into a file, as well as take the second atg read to tag, output that sequence into the same file.
Output I want:
atg ttg tca aat tca tgg atc atg ttg tca aat tca tgg atc tag
atg ttg tca aat tca tgg atc tag
How can I go about this?
Thank you for the help.
回答1:
Would you please try the following:
str="atg ttg tca aat tca tgg atc atg ttg tca aat tca tgg atc tag"
start="atg" # start marker of the sequence
end="tag" # end marker of the sequence
read -r -a ary <<< "$str"
for (( i=0; i<${#ary[@]}; i++ )); do
if [[ ${ary[$i]} = $start ]]; then
index_s+=("$i")
elif [[ ${ary[$i]} = $end ]]; then
index_e+=("$i")
fi
done
s=${index_s[0]}; n=$(( ${index_e[0]} - ${index_s[0]} + 1 ))
echo "${ary[@]:$s:$n}" > "result.txt"
s=${index_s[1]}; n=$(( ${index_e[0]} - ${index_s[1]} + 1 ))
echo "${ary[@]:$s:$n}" >> "result.txt"
Result:
atg ttg tca aat tca tgg atc atg ttg tca aat tca tgg atc tag
atg ttg tca aat tca tgg atc tag
[How it works]
read -r -a ary <<< "$str"
splits$str
on whitespaces (IFS) and stores the elements into an arrayary
.- Then the
for
loop iterates over the array elements for the start/end markers. - If the start marker
atg
is found, the position is stored in an arrayindex_s
. Finally${index_s[0]}
holds the first position of the start marker and${index_s[1]}
holds the second one (and so on). The same operation is performed with the end markertag
. - Eventually the script outputs two sets of array slice. One starts with
the first
atg
and ends with the firsttag
. The other starts with the secondatg
and ends with the firsttag
.
Hope this helps.
回答2:
When you want at most 2 sequences, you can grep
inside the original and a modified string:
s='atg ttg tca aat tca tgg atc atg ttg tca aat tca tgg atc tag'
printf "%s\n" "$s" "${s#*atg}" | grep -Eo "atg.*tag"
When you want to extract more than 2 substrings when available, you need a loop.
s='atg ttg tca aat tca tgg atc atg ttg tca aat tca tgg atc tag'
while [ "$s" ]; do
s=$(grep -Eo "atg.*tag" <<< "$s")
if [ "$s" ]; then
echo "$s"
s="${s#atg}"
fi
done
来源:https://stackoverflow.com/questions/58295188/i-want-to-be-able-to-extract-two-different-sequences-from-one-line