a regular expression generator for number ranges

前端 未结 9 2285
长情又很酷
长情又很酷 2021-02-04 05:56

I checked on the stackExchange description, and algorithm questions are one of the allowed topics. So here goes.

Given an input of a range, where begin and ending number

9条回答
  •  半阙折子戏
    2021-02-04 06:08

    You cannot cover your requirement with Character Groups only. Imagine the Range 129-131. The Pattern 1[2-3][1-9] would also match 139 which is out of range.

    So in this example you need to change the last group to something else: 1[2-3](1|9). You can now find this effect as well for the tens and hundrets, leading to the problem that aapattern that basically represents each valid number as a fixed sequence of numbers is the only working solution. (if you don't want an algorithm that needs to track overflows in order to decide whether it should use [2-8] or (8,9,0,1,2))

    if you anyway autogenerate the pattern - keep it simple:

    128-132
    

    can be written as (I left out the non-matching group addition ?: for better readability)

    (128|129|130|131|132)
    

    algorithm should be ovious, a for, an array, string concatenation and join.

    That would already work as expected, but you can also perform some "optimization" on this if you like it more compact:

    (128|129|130|131|132) <=>
    1(28|29|30|31|32) <=>
    1(2(8|9)|3(0|1|2))
    

    more optimization

    1(2([8-9])|3([0-2]))
    

    Algorithms for the last steps are out there, look for factorization. An easy way would be to push all the numbers to a tree, depending on the character position:

    1
      2
        8
        9
      3
        0
        1
        2
    

    and finally iterate over the three and form the pattern 1(2(8|9)|3(0|1|2)). As a last step, replace anything of the pattern (a|(b|)*?c) with [a-c]

    Same goes for 11-29:

    11-29 <=>
    (11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29) <=>   
    (1(1|2|3|4|5|7|8|9)|2(1|2|3|4|5|7|8|9)) <=>
    (1([1-9])|2([1-9]) 
    

    as an addition you now can proceed with the factorization:

    (1([1-9])|2([1-9]) <=>
    (1|2)[1-9] <=>
    [1-2][1-9]
    

提交回复
热议问题