问题
I'd like to generate a (series of) regexp(s) from a numeric range.
Example:
1013 - 4044 =>
regexp matches
---------------------------------------
101[3-9] 1013 - 1019
10[2-9][0-9] 1020 - 1099
11[0-9][0-9] 1100 - 1199
[23][0-9][0-9][0-9] 2000 - 3999
40[0-3][0-9] 4000 - 4039
404[0-4] 4040 - 4044
what is the simplest algorithm?
What is the easiest way to reverse it (i.e. given the regexps, looking for the ranges)?
Would be nice to see solutions in java, clojure, perl...
Thanks!
回答1:
There is an online tool for generating regex given a range, and provides the explanation. You can find the source code there also. For example:
^(101[3-9]|10[2-9][0-9]|1[1-9][0-9]{2}|[23][0-9]{3}|40[0-3][0-9]|404[0-4])$
First, break into equal length ranges: 1013 - 4044 Second, break into ranges that yield simple regexes: 1013 - 1019 1020 - 1099 1100 - 1999 2000 - 3999 4000 - 4039 4040 - 4044 Turn each range into a regex: 101[3-9] 10[2-9][0-9] 1[1-9][0-9]{2} [23][0-9]{3} 40[0-3][0-9] 404[0-4] Collapse adjacent powers of 10: 101[3-9] 10[2-9][0-9] 1[1-9][0-9]{2} [23][0-9]{3} 40[0-3][0-9] 404[0-4] Combining the regexes above yields: (101[3-9]|10[2-9][0-9]|1[1-9][0-9]{2}|[23][0-9]{3}|40[0-3][0-9]|404[0-4]) Next we'll try factoring out common prefixes using a tree: Parse into tree based on regex prefixes: . 1 0 1 [3-9] + [2-9] [0-9] + [1-9] [0-9]{2} + [23] [0-9]{3} + 4 0 [0-3] [0-9] + 4 [0-4] Turning the parse tree into a regex yields: (1(0(1[3-9]|[2-9][0-9])|[1-9][0-9]{2})|[23][0-9]{3}|40([0-3][0-9]|4[0-4])) We choose the shorter one as our result. ^(101[3-9]|10[2-9][0-9]|1[1-9][0-9]{2}|[23][0-9]{3}|40[0-3][0-9]|404[0-4])$
To reverse it, you can look at the character classes, and get the minimum and maximum for each alternative.
^(101[3-9]|10[2-9][0-9]|1[1-9][0-9]{2}|[23][0-9]{3}|40[0-3][0-9]|404[0-4])$
=> 1013 1020 1100 2000 4000 4040 lowers
1019 1999 1199 3999 4039 4044 uppers
=> 1013 - 4044
来源:https://stackoverflow.com/questions/4101053/generate-a-regexp-from-a-numeric-range