问题
I am given a string and have to return False if there is one or more invalid characters, otherwise True. The caveat is that I can only built-in functions and str operations (for example: in, +, indexing, len) and recursion. What I have so far is not working:
def is_valid_sequence(dna):
""" (str) -> bool
Return True if and only if the DNA sequence is valid
(A, T, C, and G)
:param dna: string sequence
:return: true or false
>>> is_valid_sequence('ATTAC')
True
>>> is_valid_sequence('FBSSS')
False
"""
valid_seq = 0
if len(dna) == 1 and dna in ['A', 'T', 'C', 'G']:
valid_seq += 1
else:
return is_valid_sequence(dna[1:])
Obviously, this code doesn't work because of recursion and adding 1 to the valid_seq
variable just gets wiped after the next recursive iteration.
回答1:
For DNA sequences of a small size (around a thousand characters), this is a practical implementation
def is_valid_sequence (dna):
# base case
if len (dna) == 0:
return True
# check if first character is valid
elif dna [0] not in ['A', 'T', 'C', 'G']:
return False
# otherwise, recurse on the rest of the characters
else:
return is_valid_sequence (dna [1:])
print (is_valid_sequence ('AATGCGA')) # True
print (is_valid_sequence ('AATXGCGA')) # False
A word of caution
Be careful with recursion in python tho – a long dna
string will cause a stack overflow. Attempting to validate even a sequence this "large" would fail
GATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACA
You can easily avoid this by implementing is_valid_sequence
using a Clojure-style loop
/recur
mechanism for constant-space recursion
def recur (*values):
return (recur, values)
def loop (f):
acc = f ()
while type (acc) is tuple and acc [0] is recur:
acc = f (*acc [1])
return acc
def is_valid_sequence (dna):
# stack-safe loop/recur implementation
# initialize loop state variable `s` to value of `dna`
return loop (lambda s = dna:
# base case
True if len (s) == 0 else
# check if first character valid
False if s [0] not in ['A', 'T', 'C', 'G'] else
# else recurse on the rest of the characters
recur (s [1:]))
# does not overflow stack
print (is_valid_sequence ('GATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACA'))
# True
Continued improvements
The loop
/recur
implementation exposes an additional opportunity to tune the performance of our function. Slicing the string as we have done with dna[0]
and dna[1:]
creates new strings in memory; this was only necessary because of the recursion API used in the first function we wrote
The loop
/recur
interface allows us to use whatever data type is suitable for computing our output – in this case, a simple integer index will do. Lexical scoping takes care of the rest – dna
is accessible within our lambda and does not need to do dna[1:]
slicing which will save a lot of time/space for large inputs
def is_valid_sequence (dna):
# only create this array once
valid_chars = ['A', 'T', 'C', 'G']
# initialize loop state variable `i` with 0
return loop (lambda i = 0:
True if len (dna) == i else
False if dna [i] not in valid_chars else
recur (i + 1))
Python and the Unruly Lambda
Notice how we were forced to use a pure expression inside the lambda instead of the traditional if/elif/else
statement syntax – this is not a problem for simple programs, but more complex programs might be difficult to express this way in python.
This works identically to the program above but uses a plain old function instead of a lambda
def is_valid_sequence (dna):
valid_chars = ['A', 'T', 'C', 'G']
# plain old function; replaces lambda
def myloop (i = 0):
if len (dna) == 0:
return True
elif dna [i] not in valid_chars:
return False
else:
return recur (i + 1)
# use plain old function instead of lambda
return loop (myloop)
回答2:
As others have said, your recursive function does not return when it hits the end of the recursion. You can say something like:
if len(dna) == 1 and dna in ['A','T','G','C']:
return True
That is, if you know your dna
string is always greater than or equal to one in length. All together you might end up with something like:
def is_vaild_sequence(dna):
if dna[0] not in ['A','T','G','C']:
return False
if len(dna) == 1:
return True
return is_vaild_sequence(dna[1:])
Here, the first check determines if the dna
's first character is valid, followed by the recursive structure to check the whole sequence.
Ideally, if this can be solved without the constraint of recursion, take it. An iterative approach is far better i.e.
for i in range(0,len(dna)):
if dna[i] not in ['A','T','G','C']:
return False
return True
来源:https://stackoverflow.com/questions/46267711/valid-characters-in-a-string