问题
I am going through exercises regarding regular expressions, and I am really unsure on how to do this.
The regular expression is:
((a*)(b*))* ∪ (a*)
I am really bad at this, but I think that ((a*)(b*))* can be simplified to (a ∪ b)* But if this is right, than the last ∪ (a*) is really just a repetition, so I figure the whole expression can be simplified to (a ∪ b)*. Does this seem correct?
Edit: ∪ stands for union
回答1:
You are right. (a*b*)* can match any string of a's and b's, so can (a U b)*, therefore they are equivalent. (a U b)* intersect a* is a* so a* is a subset of (a U b)*. Consequently, the whole expression can be simplified to (a U b)*.
回答2:
What ((a*)(b*))*U(a*) really means is (copied from here)
NODE EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1 (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
a* 'a' (0 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
b* 'b' (0 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
) end of \3
--------------------------------------------------------------------------------
)* end of \1 (NOTE: because you are using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \1)
--------------------------------------------------------------------------------
U 'U'
--------------------------------------------------------------------------------
( group and capture to \4:
--------------------------------------------------------------------------------
a* 'a' (0 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
) end of \4
This expression currently matches all of these sequences: abUa bU U aabbUaa aaUaa aaU Uaa bbU ababUaa aabbaabbUaa (look at here)
There is no way to simplify this, without removing capturing groups and remaining order of letters.
EDIT: If U in your regex statement stands for "union", then this expression is invalid. There is no way to union anything in regex. There is only OR and you need to use | (pipe) for that. If you want to union ((a*)(b*))* and (a*) then probably it will be ((a*)(b*))*, but it will still match anything like abaab.
Still, capturing groups in your regex statement are useless, so something like [ab]* is enough to match any number of a's and b's.
来源:https://stackoverflow.com/questions/23566410/simplifying-regular-expression