Regex: Workaround for PHP's look-behind fixed-length assert limitation

余生颓废 提交于 2020-01-05 04:36:05

问题


I'm trying to understand more about look-around asserts and I found this thread, where their solution is supposed to work in some engines but not PHP's, because of PHP's requiring the look-behind assert to be fixed-length.

What I want is to make the same scenario work in PHP or at least know if it's possible at all.

I tried to reduce the regex rules explanation, so it's not the same as in the thread I mention above, but it follows the same principle.

Need to match a string built in three parts:

  • Starts with any number of alphanumeric characters
  • Does not contain "abc-" followed by between 3 and 5 numbers and/or hyphens
  • Ends with ".htm" or ".html"

So, these would match:

  • xxxyz-123.html
  • xx123-abc.htm
  • xxabc123.html
  • xxabc-123-45.htm

But these would NOT match:

  • xxabc-4324.htm
  • xxabc-1-2.html
  • xxac-12-34.txt
  • xxabc-12345.htm

I've been trying with some variations of the regex pattern below but it's not working - this particular case because of the fixed-length limitation:

.*(?<!abc-[\d-]{3,5})\.htm[^l]?$

I also used different test strings and forgot about the 3-5 range part, focusing only on exactly , say, 3 numbers and/or hyphens, and used the regex below, and it still doesn't work, which is why I decided to ask for help on this:

.*(?<!abc-[\d-]{3})\.htm[^l]?$

Could anyone of you regex gurus help me out here?

Edit

This is my testing PHP code:

$regex = "/^(?!.*abc-[\d-]{3,5})[a-zA-Z0-9-]+\.html?$/";

foreach ( $matching2 as $k => $v ) {
    $matches = preg_match( $regex, $v );

    echo '"', $v, '"', ( $matches != 0 ) ? ' matches' : ' doesn\'t match', '<br />';
}

回答1:


Why do you need to need to look at that in reverse? Why not just use a lookahead?

^(?!.*abc-[\d-]{3,5}[^\d-])[a-zA-Z0-9-]+\.html?$

This will simply start looking at the beginning of the string and the lookahead tries to find the disallowed string anywhere (.*) in the string. If it does, the lookahead makes the pattern fail. This also include the requirement, that the string consists only of alphanumerics and hyphens.

This is by the way the same solution that is used for the question you linked. Perl cannot cope with variable-length lookbehinds either. Only .NET can.

Another note: if you ever encounter an example where you actually do need a variable-length lookbehind (but not a variable-length lookahead)... reverse the string (and the pattern, too, of course). ;)




回答2:


You probably looking for regex pattern

^(?!.*abc-[\d-]{3,5}[^\d-])[A-Za-z0-9].*[.]html?$


来源:https://stackoverflow.com/questions/13463723/regex-workaround-for-phps-look-behind-fixed-length-assert-limitation

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!