Minimization of the regex

爱⌒轻易说出口 提交于 2019-12-04 05:35:59

问题


I am fairly new to Programming world. I am trying to create a common regex that would match only list of strings given, nothing more than that.

For Eg., given the below list

List = ['starguide,'snoreguide','snoraguide','smarguides']

It should create a regex like this - s(((tar|nor(e|a))(guide))|marguides)

I implemented a trie. Could only manage to get s(marguides|nor(aguide|eguide)|targuide)

I want my regex to be shortened (common suffixes tied together). Is there any better way to shorten the regex I am getting from the trie?


回答1:


To get the desired result try use automata minimization.

For your simple example, deterministic automaton suffices.

Use github.com/siddharthasahu/automata-from-regex to build min deterministic state machine/automaton from trivial regex (enumeration of words), then transform automaton into regex (it is easy for acyclic automata, http://www-igm.univ-mlv.fr/~dr/thdr/ www.dcc.fc.up.pt/~nam/publica/extAbsCIAA05.pdf) see also https://cs.stackexchange.com/questions/2016/how-to-convert-finite-automata-to-regular-expressions

In general case, non-determinist automata could yield shorter regex, yet it is a hard problem https://cstheory.stackexchange.com/questions/31630/how-can-one-actually-minimize-a-regular-expression



来源:https://stackoverflow.com/questions/52930675/minimization-of-the-regex

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!