generate a regexp from a numeric range

守給你的承諾、 提交于 2019-12-18 05:14:34

问题


I'd like to generate a (series of) regexp(s) from a numeric range.

Example:

1013 - 4044 => 

regexp                      matches
---------------------------------------
101[3-9]                    1013 - 1019
10[2-9][0-9]                1020 - 1099
11[0-9][0-9]                1100 - 1199
[23][0-9][0-9][0-9]         2000 - 3999
40[0-3][0-9]                4000 - 4039
404[0-4]                    4040 - 4044

what is the simplest algorithm?

What is the easiest way to reverse it (i.e. given the regexps, looking for the ranges)?

Would be nice to see solutions in java, clojure, perl...

Thanks!


回答1:


There is an online tool for generating regex given a range, and provides the explanation. You can find the source code there also. For example:

^(101[3-9]|10[2-9][0-9]|1[1-9][0-9]{2}|[23][0-9]{3}|40[0-3][0-9]|404[0-4])$
First, break into equal length ranges:
  1013 - 4044

Second, break into ranges that yield simple regexes:
  1013 - 1019
  1020 - 1099
  1100 - 1999
  2000 - 3999
  4000 - 4039
  4040 - 4044

Turn each range into a regex:
  101[3-9]
  10[2-9][0-9]
  1[1-9][0-9]{2}
  [23][0-9]{3}
  40[0-3][0-9]
  404[0-4]

Collapse adjacent powers of 10:
  101[3-9]
  10[2-9][0-9]
  1[1-9][0-9]{2}
  [23][0-9]{3}
  40[0-3][0-9]
  404[0-4]

Combining the regexes above yields:
  (101[3-9]|10[2-9][0-9]|1[1-9][0-9]{2}|[23][0-9]{3}|40[0-3][0-9]|404[0-4])

Next we'll try factoring out common prefixes using a tree:
Parse into tree based on regex prefixes:
  . 1 0 1 [3-9]
      + [2-9] [0-9]
    + [1-9] [0-9]{2}
  + [23] [0-9]{3}
  + 4 0 [0-3] [0-9]
      + 4 [0-4]

Turning the parse tree into a regex yields:
  (1(0(1[3-9]|[2-9][0-9])|[1-9][0-9]{2})|[23][0-9]{3}|40([0-3][0-9]|4[0-4]))

We choose the shorter one as our result.

^(101[3-9]|10[2-9][0-9]|1[1-9][0-9]{2}|[23][0-9]{3}|40[0-3][0-9]|404[0-4])$

To reverse it, you can look at the character classes, and get the minimum and maximum for each alternative.

   ^(101[3-9]|10[2-9][0-9]|1[1-9][0-9]{2}|[23][0-9]{3}|40[0-3][0-9]|404[0-4])$
=>   1013     1020         1100            2000        4000         4040     lowers
        1019         1999        1199         3999            4039     4044  uppers

=> 1013 - 4044


来源:https://stackoverflow.com/questions/4101053/generate-a-regexp-from-a-numeric-range

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!