Merge several regexes to a single one

≯℡__Kan透↙ 提交于 2019-12-30 08:32:09

问题


I have several regexes (actually several thousands), and I must check if one string matches any of these regexes. It is not very efficient, so I would like to merge all these regexes as a single regex.

For example, if a have these regexes:

  • 'foo *bar'
  • 'foo *zip'
  • 'zap *bar'

I would like to obtain something like 'foo *(bar|zip)|zap *bar'.

Is there some algorithm, library or tool to do this?


回答1:


You can just concatenate the regexes using or (|) (and anchors for the beginning/end of string).

Most good regex libraries optimize their finite state automata after they build it from your regex. PCRE does that, for instance.

This step usually takes care of your optimization problem, ie. they apply most of the transformations you would have to do "by hand".




回答2:


In theory a regex is a (nondeterministic)finite-state automata; thus they can be merged and minimized. You can take a look at this as a starting point.

Beware, though, that this might not be the most correct answer. Why do you have to deal with several thousands regular expressions? I can only fathom the maintentance hell of such a thing. Perhaps you should consider writing a parser and a grammar -- much easily done (and grammars are more powerful than regexps anyways).




回答3:


I can't imagine, even if possible, that the resultant regex would be any more efficient.




回答4:


I very much doubt it, on the grounds that any such tool would have to be very complex to deal with all the different ways in which a regex could be combined.

If the regexes you have are relatively simple, such as in your examples, you may have some luck writing your own, however.



来源:https://stackoverflow.com/questions/1888765/merge-several-regexes-to-a-single-one

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!