I guess my question is best explained with an (simplified) example.
Regex 1:
^\\d+_[a-z]+$
Regex 2:
^\\d*$
<
I would construct the regular expression for the intersection, then convert to a regular grammar in normal form, and see if it's the empty language...
That seems like shooting sparrows with a cannon. Why not just construct the product automaton and check if an accept state is reachable from the initial state? That'll also give you a string in the intersection straight away without having to construct a regular expression first.
I would be a bit surprised to learn that there is a polynomial-time solution, and I would not be at all surprised to learn that it is equivalent to the halting problem.
I only know of a way to do it which involves creating a DFA from a regexp, which is exponential time (in the degenerate case). It's reducible to the halting problem, because everything is, but the halting problem is not reducible to it.
If the last, then you can use the fact that any RE can be translated into a finite state machine. Two finite state machines are equal if they have the same set of nodes, with the same arcs connecting those nodes.
So, given what I think you're using as a definition for orthogonal, if you translate your REs into FSMs and those FSMs are not equal, the REs are orthogonal.
That's not correct. You can have two DFAs (FSMs) that are non-isomorphic in the edge-labeled multigraph sense, but accept the same languages. Also, were that not the case, your test would check whether two regexps accepted non-identical, whereas OP wants non-overlapping languages (empty intersection).
Also, be aware that the \1, \2, ..., \9 construction is not regular: it can't be expressed in terms of concatenation, union and * (Kleene star). If you want to include back substitution, I don't know what the answer is. Also of interest is the fact that the corresponding problem for context-free languages is undecidable: there is no algorithm which takes two context-free grammars G1 and G2 and returns true iff L(G1) ∩ L(g2) ≠ Ø.
You can maybe use something like Regexp::Genex to generate test strings to match a specified regex and then use the test string on the 2nd regex to determine whether the 2 regexes are orthogonal.
I believe kdgregory is correct you're using Orthogonal to mean Complement.
Is this correct?
By "Orthogonal" you mean "the intersection is the empty set" I take it?
I would construct the regular expression for the intersection, then convert to a regular grammar in normal form, and see if it's the empty language...
Then again, I'm a theorist...
The fsmtools can do all kinds of operations on finite state machines, your only problem would be to convert the string representation of the regular expression into the format the fsmtools can work with. This is definitely possible for simple cases, but will be tricky in the presence of advanced features like look{ahead,behind}.
You might also have a look at OpenFst, although I've never used it. It supports intersection, though.
Excellent point on the \1, \2 bit... that's context free, and so not solvable. Minor point: Not EVERYTHING is reducible to Halt... Program Equivalence for example.. – Brian Postow
[I'm replying to a comment]
IIRC, a^n b^m a^n b^m
is not context free, and so (a\*)(b\*)\1\2
isn't either since it's the same. ISTR { ww | w ∈ L }
not being "nice" even if L is "nice", for nice being one of regular
, context-free
.
I modify my statement: everything in RE is reducible to the halting problem ;-)