Lets say I have a list of strings,
string_lst = [\'fun\', \'dum\', \'sun\', \'gum\']
I want to make a regular expression, where at a point
regex module has named lists (sets actually):
#!/usr/bin/env python
import regex as re # $ pip install regex
p = re.compile(r"\L", words=['fun', 'dum', 'sun', 'gum'])
if p.search("I love to have fun."):
print('matched')
Here words
is just a name, you can use anything you like instead.
.search()
methods is used instead of .*
before/after the named list.
To emulate named lists using stdlib's re
module:
#!/usr/bin/env python
import re
words = ['fun', 'dum', 'sun', 'gum']
longest_first = sorted(words, key=len, reverse=True)
p = re.compile(r'(?:{})'.format('|'.join(map(re.escape, longest_first))))
if p.search("I love to have fun."):
print('matched')
re.escape()
is used to escape regex meta-characters such as .*?
inside individual words (to match the words literally).
sorted()
emulates regex
behavior and it puts the longest words first among the alternatives, compare:
>>> import re
>>> re.findall("(funny|fun)", "it is funny")
['funny']
>>> re.findall("(fun|funny)", "it is funny")
['fun']
>>> import regex
>>> regex.findall(r"\L", "it is funny", words=['fun', 'funny'])
['funny']
>>> regex.findall(r"\L", "it is funny", words=['funny', 'fun'])
['funny']