Convert a regular expression A that accepts a set of strings to one that accepts all prefixes of the strings that A matches

落爺英雄遲暮 提交于 2021-02-08 05:19:11

问题


Given any regular expression A, is there a way to to transform it to anouther regular expression B, that accepts all the strings and prefixes of strings that A accepts.

for example if /apple/ is the given regular expression, is there a generalized way to convert it to /a|ap|app|appl|apple/


回答1:


If you're talking about formal regular expressions (i.e. regular expressions that describe regular languages), then here's a procedure to convert a regular expression into one that accepts prefixes.

Any regular expression has a DFA; here's the DFA for /apple/ (with transitions to failure states left out):

DFA for /apple/

To produce a DFA that matches prefixes of strings accepted by this DFA, convert states to accepting states if they lie along paths that lead to accepting states in the original DFA:

DFA for prefixes of /apple/

There are several methods for reading a regular expression from a DFA. If we use the state removal technique, we arrive at the following DFA:

DFA for prefixes of /apple/, after state elimination

This corresponds to the regular expression /a|ap|app|appl|apple|/, plus the empty string (since the empty string is a prefix of any regular expression).

The apple example is trivial, but this same technique can be used for more complicated regular expressions. For example, consider /(00)*1(00|1)*/:

DFA for /(00)*1(00|1)*/

This DFA accepts the string 00100 but doesn't accept 0010101. After converting the appropriate states to final states and combining two identical states, we have

Slightly minimized DFA for prefixes of /(00)*1(00|1)*/

This is equivalent to

enter image description here

from which we can read the regular expression /(00)*(0?|1(1|00)*0?)/, which includes the empty string.

This regular expression rejects 00101 because it causes the original DFA to transition into a failing state, but accepts '0' and '00', because those strings do not cause the original DFA to enter a failure state.




回答2:


Depends on what you mean by generalized way.

\b(a(p?(p?(l?(e?)))))\b

Edit: a positive look behind addition would represent a better solution but it depends entirely on the regular expression machine implementation.



来源:https://stackoverflow.com/questions/12015612/convert-a-regular-expression-a-that-accepts-a-set-of-strings-to-one-that-accepts

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!