I need to find out whether a name starts with any of a list\'s prefixes and then remove it, like:
if name[:2] in [\"i_\", \"c_\", \"m_\", \"l_\", \"d_\", \"t
A bit hard to read, but this works:
name=name[len(filter(name.startswith,prefixes+[''])[0]):]
Regexes will likely give you the best speed:
prefixes = ["i_", "c_", "m_", "l_", "d_", "t_", "e_", "b_", "also_longer_"]
re_prefixes = "|".join(re.escape(p) for p in prefixes)
m = re.match(re_prefixes, my_string)
if m:
my_string = my_string[m.end()-m.start():]
str.startswith(prefix[, start[, end]])¶
Return True if string starts with the prefix, otherwise return False. prefix can also be a tuple of prefixes to look for. With optional start, test string beginning at that position. With optional end, stop comparing string at that position.
$ ipython
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.4.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: prefixes = ("i_", "c_", "m_", "l_", "d_", "t_", "e_", "b_")
In [2]: 'test'.startswith(prefixes)
Out[2]: False
In [3]: 'i_'.startswith(prefixes)
Out[3]: True
In [4]: 'd_a'.startswith(prefixes)
Out[4]: True
When it comes to search and efficiency always thinks of indexing techniques to improve your algorithms. If you have a long list of prefixes you can use an in-memory index by simple indexing the prefixes by the first character into a dict
.
This solution is only worth if you had a long list of prefixes and performance becomes an issue.
pref = ["i_", "c_", "m_", "l_", "d_", "t_", "e_", "b_"]
#indexing prefixes in a dict. Do this only once.
d = dict()
for x in pref:
if not x[0] in d:
d[x[0]] = list()
d[x[0]].append(x)
name = "c_abcdf"
#lookup in d to only check elements with the same first character.
result = filter(lambda x: name.startswith(x),\
[] if name[0] not in d else d[name[0]])
print result
import re
def make_multi_prefix_replacer(prefixes):
if isinstance(prefixes,str):
prefixes = prefixes.split()
prefixes.sort(key = len, reverse=True)
pat = r'\b(%s)' % "|".join(map(re.escape, prefixes))
print 'regex patern :',repr(pat),'\n'
def suber(x, reg = re.compile(pat)):
return reg.sub('',x)
return suber
pfxs = "x ya foobar yaku foo a|b z."
replacer = make_multi_prefix_replacer(pfxs)
names = "xenon yadda yeti yakute food foob foobarre foo a|b a b z.yx zebra".split()
for name in names:
print repr(name),'\n',repr(replacer(name)),'\n'
ss = 'the yakute xenon is a|bcdf in the barfoobaratu foobarii'
print '\n',repr(ss),'\n',repr(replacer(ss)),'\n'