Calculate if two infinite regex solution sets don't intersect

落爺英雄遲暮 提交于 2019-12-18 15:35:24

问题


In calculate if two arbitrary regular expressions have any overlapping solutions (assuming it's possible).

For example these two regular expressions can be shown to have no intersections by brute force because the two solution sets are calculable because it's finite.

^1(11){0,1000}$ ∩     ^(11){0,1000}$        = {}
{1,111, ..., ..111} ∩ {11,1111, ..., ...11} = {}
{}                                          = {}

But replacing the {0,1000} by * remove the possibility for a brute force solution, so a smarter algorithm must be created.

^1(11)*$ ∩ ^(11)*$ = {}
{1,^1(11)*$} ∩ {^(11)*$} = {}
{1,^1(11)*$} ∩ {11,^11(11)*$} = {}
{1,111,^111(11)*$} ∩ {11,^(11)*$} = {}
.....

In another similar question one answer was to calculate the intersection regex. Is that possible to do? If so how would one write an algorithm to do such a thing?

I think this problem might be domain of the halting problem.

EDIT:

I've used the accepted solution to create the DFAs for the example problem. It's fairly easy to see how you can use a BFS or DFS on the graph of states for M_3 to determine if a final state from M_3 is reachable.


回答1:


It is not in the domain of the halting problem; deciding whether the intersection of regular languages is empty or not can be solved as follows:

  1. Construct a DFA M1 for the first language.
  2. Construct a DFA M2 for the second language. Hint: Kleene's Theorem and Power Set machine construction
  3. Construct a DFA M3 for M1 intersect M2. Hint: Cartesian Product Machine construction
  4. Determine whether L(M3) is empty. Hint: If M3 has n states, and M3 doesn't accept any strings of length no greater than n, then L(M3) is empty... why?

Each of those things can be algorithmically done and/or checked. Also, naturally, once you have a DFA recognizing the intersection of your languages, you can construct a regex to match the language. And if you start out with a regex, you can make a DFA. This is definitely computable.

EDIT:

So to build a Cartesian Product Machine, you need two DFAs. Let M1 = (E, q0, Q1, A1, f1) and M2 = (E, q0', Q2, A2, f2). In both cases, E is the input alphabet, q0 is the start state, Q is the set of all states, A is the set of accepting states, and f is the transition function. Construct M3 where...

  1. E3 = E
  2. Q3 = Q1 x Q2 (ordered pairs)
  3. q0'' = (q0, q0')
  4. A3 = {(x, y) | x in A1 and y in A2}
  5. f3(s, (x, y)) = (f1(s, x), f2(s, y))

Provided I didn't make any mistakes, L(M3) = L(M1) intersect L(M2). Neat, huh?




回答2:


I've created a PHP implementation of Patrick87 answer. In addition to implementing the Intersection via Cartesian Product Machine, I've also implemented an alterative algorithm for finding Intersections of DFAs using De Morgan.

Intersection( DFA_1, DFA_2 ) === ! UNION( ! DFA_1, ! DFA_2 )

* ! is defined as negation

This works very well for DFAs as the negation of a fully defined DFA (those with every possible transition state defined) is just to add all non-final states to the final state set and remove all current final states from the final state set (non-final -> final, final -> non->final). Union of DFA can be done easily by turning them into a NFA and then creating a new starting node that connects the unioned DFA's old start nodes by lambda transforms.

In addition to solving the intersection problem, the library I created is also able to determinize a NFA to a DFA and convert Regex to NFA.

EDIT:

I have created a webapp that allows this sort of transformations on regex languagues using what I learned form this question (and others).



来源:https://stackoverflow.com/questions/7732815/calculate-if-two-infinite-regex-solution-sets-dont-intersect

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!