Efficient way to search for invalid characters in python

前端 未结 9 1425
萌比男神i
萌比男神i 2021-01-07 03:10

I am building a forum application in Django and I want to make sure that users dont enter certain characters in their forum posts. I need an efficient way to scan their whol

相关标签:
9条回答
  • 2021-01-07 03:43

    If efficiency is a major concern I would re.compile() the re string, since you're going to use the same regex many times.

    0 讨论(0)
  • 2021-01-07 03:44

    In any case you need to scan the entire message. So wouldn't something simple like this work ?

    def checkMessage(topic_message):
      for char in topic_message:
           if char in "<>/\{}[]~`":
               return False
      return True
    
    0 讨论(0)
  • 2021-01-07 03:57

    I agree with gnibbler, regex is an overkiller for this situation. Probably after removing this unwanted chars you'll want to remove unwanted words also, here's a little basic way to do it:

    def remove_bad_words(title):
    '''Helper to remove bad words from a sentence based in a dictionary of words.
    '''
    word_list = title.split(' ')
    for word in word_list:
        if word in BAD_WORDS: # BAD_WORDS is a list of unwanted words
            word_list.remove(word)
    #let's build the string again
    title2 = u''
    for word in word_list:
        title2 = ('%s %s') % (title2, word)
        #title2 = title2 + u' '+ word
    
    return title2
    
    0 讨论(0)
  • 2021-01-07 04:00

    Example: just tailor to your needs.

    ### valid chars: 0-9 , a-z, A-Z only
    import re
    REGEX_FOR_INVALID_CHARS=re.compile( r'[^0-9a-zA-Z]+' )
    list_of_invalid_chars_found=REGEX_FOR_INVALID_CHARS.findall( topic_message )
    
    0 讨论(0)
  • 2021-01-07 04:04

    You have to be much more careful when using regular expressions - they are full of traps.

    in the case of [^<>/\{}[]~] the first ] closes the group which is probably not what you intended. If you want to use ] in a group it has to be the first character after the [ eg []^<>/\{}[~]

    simple test confirms this

    >>> import re
    >>> re.search("[[]]","]")
    >>> re.search("[][]","]")
    <_sre.SRE_Match object at 0xb7883db0>
    

    regex is overkill for this problem anyway

    def clean_topic_message(self):
        topic_message = self.cleaned_data['topic_message']
        invalid_chars = '^<>/\{}[]~`$'
        if (topic_message == ""):
            raise forms.ValidationError(_(u'Please provide a message for your topic'))
        if set(invalid_chars).intersection(topic_message):
            raise forms.ValidationError(_(u'Topic message cannot contain the following: %s'%invalid_chars))
        return topic_message
    
    0 讨论(0)
  • 2021-01-07 04:06

    is_valid = not any(k in text for k in '<>/{}[]~`')

    0 讨论(0)
提交回复
热议问题