Efficient way to search for invalid characters in python

前端未结

关注

 9  1425

I am building a forum application in Django and I want to make sure that users dont enter certain characters in their forum posts. I need an efficient way to scan their whol

相关标签:

9条回答

灰色年华

2021-01-07 03:43

If efficiency is a major concern I would re.compile() the re string, since you're going to use the same regex many times.

0 讨论(0)
发布评论:

提交评论
- 加载中...
无人共我

2021-01-07 03:44
In any case you need to scan the entire message. So wouldn't something simple like this work ?
```
def checkMessage(topic_message):
  for char in topic_message:
       if char in "<>/\{}[]~`":
           return False
  return True
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

你的背包

2021-01-07 03:57

I agree with gnibbler, regex is an overkiller for this situation. Probably after removing this unwanted chars you'll want to remove unwanted words also, here's a little basic way to do it:

def remove_bad_words(title):
'''Helper to remove bad words from a sentence based in a dictionary of words.
'''
word_list = title.split(' ')
for word in word_list:
    if word in BAD_WORDS: # BAD_WORDS is a list of unwanted words
        word_list.remove(word)
#let's build the string again
title2 = u''
for word in word_list:
    title2 = ('%s %s') % (title2, word)
    #title2 = title2 + u' '+ word

return title2

0 讨论(0)

萌比男神i

2021-01-07 04:00

Example: just tailor to your needs.

### valid chars: 0-9 , a-z, A-Z only
import re
REGEX_FOR_INVALID_CHARS=re.compile( r'[^0-9a-zA-Z]+' )
list_of_invalid_chars_found=REGEX_FOR_INVALID_CHARS.findall( topic_message )

0 讨论(0)

一个人的身影

2021-01-07 04:04

You have to be much more careful when using regular expressions - they are full of traps.

in the case of [^<>/\{}[]~] the first ] closes the group which is probably not what you intended. If you want to use ] in a group it has to be the first character after the [ eg []^<>/\{}[~]

simple test confirms this

>>> import re
>>> re.search("[[]]","]")
>>> re.search("[][]","]")
<_sre.SRE_Match object at 0xb7883db0>

regex is overkill for this problem anyway

def clean_topic_message(self):
    topic_message = self.cleaned_data['topic_message']
    invalid_chars = '^<>/\{}[]~`$'
    if (topic_message == ""):
        raise forms.ValidationError(_(u'Please provide a message for your topic'))
    if set(invalid_chars).intersection(topic_message):
        raise forms.ValidationError(_(u'Topic message cannot contain the following: %s'%invalid_chars))
    return topic_message

0 讨论(0)

孤独总比滥情好

2021-01-07 04:06

is_valid = not any(k in text for k in '<>/{}[]~`')

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页