Valid characters in a String

前端 未结 2 528
刺人心
刺人心 2020-12-07 06:07

I am given a string and have to return False if there is one or more invalid characters, otherwise True. The caveat is that I can only built-in functions and str operations

相关标签:
2条回答
  • 2020-12-07 06:31

    As others have said, your recursive function does not return when it hits the end of the recursion. You can say something like:

    if len(dna) == 1 and dna in ['A','T','G','C']:
        return True
    

    That is, if you know your dna string is always greater than or equal to one in length. All together you might end up with something like:

    def is_vaild_sequence(dna):
        if dna[0] not in ['A','T','G','C']:
            return False
        if len(dna) == 1:
            return True
        return is_vaild_sequence(dna[1:])
    

    Here, the first check determines if the dna's first character is valid, followed by the recursive structure to check the whole sequence.

    Ideally, if this can be solved without the constraint of recursion, take it. An iterative approach is far better i.e.

    for i in range(0,len(dna)):
        if dna[i] not in ['A','T','G','C']:
            return False
    return True
    
    0 讨论(0)
  • 2020-12-07 06:46

    For DNA sequences of a small size (around a thousand characters), this is a practical implementation

    def is_valid_sequence (dna):
      # base case
      if len (dna) == 0:
        return True
      # check if first character is valid
      elif dna [0] not in ['A', 'T', 'C', 'G']:
        return False
      # otherwise, recurse on the rest of the characters
      else:
        return is_valid_sequence (dna [1:])
    
    print (is_valid_sequence ('AATGCGA'))  # True
    print (is_valid_sequence ('AATXGCGA')) # False
    

    A word of caution

    Be careful with recursion in python tho – a long dna string will cause a stack overflow. Attempting to validate even a sequence this "large" would fail

    GATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACA

    You can easily avoid this by implementing is_valid_sequence using a Clojure-style loop/recur mechanism for constant-space recursion

    def recur (*values):
      return (recur, values)
    
    def loop (f):
      acc = f ()
      while type (acc) is tuple and acc [0] is recur:
        acc = f (*acc [1])
      return acc
    
    def is_valid_sequence (dna):
      # stack-safe loop/recur implementation
      # initialize loop state variable `s` to value of `dna`
      return loop (lambda s = dna:
        # base case
        True if len (s) == 0 else
        # check if first character valid
        False if s [0] not in ['A', 'T', 'C', 'G'] else
        # else recurse on the rest of the characters
        recur (s [1:]))
    
    # does not overflow stack
    print (is_valid_sequence ('GATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACA'))
    # True
    

    Continued improvements

    The loop/recur implementation exposes an additional opportunity to tune the performance of our function. Slicing the string as we have done with dna[0] and dna[1:] creates new strings in memory; this was only necessary because of the recursion API used in the first function we wrote

    The loop/recur interface allows us to use whatever data type is suitable for computing our output – in this case, a simple integer index will do. Lexical scoping takes care of the rest – dna is accessible within our lambda and does not need to do dna[1:] slicing which will save a lot of time/space for large inputs

    def is_valid_sequence (dna):
      # only create this array once
      valid_chars = ['A', 'T', 'C', 'G']
      # initialize loop state variable `i` with 0
      return loop (lambda i = 0:
        True if len (dna) == i else
        False if dna [i] not in valid_chars else
        recur (i + 1))
    

    Python and the Unruly Lambda

    Notice how we were forced to use a pure expression inside the lambda instead of the traditional if/elif/else statement syntax – this is not a problem for simple programs, but more complex programs might be difficult to express this way in python.

    This works identically to the program above but uses a plain old function instead of a lambda

    def is_valid_sequence (dna):
      valid_chars = ['A', 'T', 'C', 'G']
      # plain old function; replaces lambda
      def myloop (i = 0):
        if len (dna) == 0:
          return True
        elif dna [i] not in valid_chars:
          return False
        else:
          return recur (i + 1)
      # use plain old function instead of lambda
      return loop (myloop)
    
    0 讨论(0)
提交回复
热议问题