Matching against a large number of strings containing spaces in pyparsing

两盒软妹~` 提交于 2019-12-24 06:44:11

问题


With pyparsing I need to write a matcher for expressions like

a + names + c 

with

a = pp.OneOrMore(pp.Word(pp.alphas))
c = pp.OneOrMore(pp.Word(pp.nums))

and names matching one of many entries in the string list names_list.

The two complications are:

  1. The entries in names_list can contain spaces.
  2. The matching needs to be case-insensitive.
  3. names_list is rather large (~20000 entries)

I tried

names_kw_list = [pp.Keyword(name, caseless=True) for name in names_list ]
names = pp.Or(names_kw_list)

This does not work for entries with spaces plus I'm worried that this is not a very performant way to write this.

Any idea to get this working for spaces in entries and maybe make it perform faster?


回答1:


A partial answer:

The spaces problem can be solved with a correct stopOn function:

def last_occurrence_of(expr):
    return expr + ~pp.FollowedBy(pp.SkipTo(expr))

names_kw_list = [pp.Keyword(word, caseless=True)
                                       for word in names_list ]
names = pp.Or(names_kw_list)("names")
a = pp.OneOrMore(pp.Word(pp.alphas), stopOn=last_occurrence_of(names))("A")
c = pp.OneOrMore(pp.Word(pp.nums))("C")

expr = a + names + c 

This instructs a not to eat into the strings of names.

However the performance deteriorates, because now the long list of names is used in a stopOn expression.



来源:https://stackoverflow.com/questions/41736402/matching-against-a-large-number-of-strings-containing-spaces-in-pyparsing

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!