Fastest way to check does string contain any word from list

亡梦爱人 提交于 2019-12-10 14:23:01

问题


I have Python application.

There is list of 450 prohibited phrases. There is message got from user. I want to check, does this message contain any of this prohibited pharases. What is the fastest way to do that?

Currently I have this code:

message = "sometext"
lista = ["a","b","c"]

isContaining = false

for a, member in enumerate(lista):
 if message.contains(lista[a]):
  isContaining = true
  break

Is there any faster way to do that? I need to handle message (max 500 chars) in less than 1 second.


回答1:


There is the any built-in function specially for that:

>>> message = "sometext"
>>> lista = ["a","b","c"]
>>> any(a in message for a in lista)
False
>>> lista = ["a","b","e"]
>>> any(a in message for a in lista)
True

Alternatively you could check the intersection of the sets:

>>> lista = ["a","b","c"]
>>> set(message) & set(lista)
set([])
>>> lista = ["a","b","e"]
>>> set(message) & set(lista)
set(['e'])
>>> set(['test','sentence'])&set(['this','is','my','sentence'])
set(['sentence'])

But you won't be able to check for subwords:

>>> set(['test','sentence'])&set(['this is my sentence'])



回答2:


Using regex compile from list

Consider memory and building time or expression, compile in advance.

lista = [...]
lista_escaped = [re.escape(item) for item in lista]
bad_match = re.compile('|'.join(lista_escaped))
is_bad = bad_match.search(message, re.IGNORECASE)



回答3:


I would combine the any builtin with the in operator:

isContaining = any(a in message for a in lista)

I don't know if this is the fastest way but it seems the simplest to me.




回答4:


We can also use set intersection method

>>> message = "sometext"
>>> lista = ["a","b","c"]
>>> isContaining = False
>>> if set(list(message)).intersection(set(lista)):
...    isContaining = True
... 
>>> isContaining
False
>>> message = "sometext a"
>>> list(message)
['s', 'o', 'm', 'e', 't', 'e', 'x', 't', ' ', 'a']
>>> if set(list(message)).intersection(set(lista)):
...    isContaining = True
... 
>>> isContaining
True


来源:https://stackoverflow.com/questions/27781506/fastest-way-to-check-does-string-contain-any-word-from-list

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!