PHP: preg_match_all() - how to find all occurrences of OR seperated substrings with a regex correctly?

白昼怎懂夜的黑 提交于 2019-12-11 15:44:03

问题


My task is to find all consecutive number in a string of only numbers. However I am not searching for a better regex to do this, but for a correct regex of matching substrings.

This is how I build my regex:

$regex = "";

for($i=0;$i<10;$i++) {
    $str = "";
    for($a=0;$a<10;$a++) {
        if($a > $i) {
            $str .= $a;
            if(strlen($str)>1) {
              $regex .= "|".$str."";
            }
        }
    }
}

$myregex = "/".ltrim($regex,"|")."/";
echo $myregex;

Result:

/12|123|1234|12345|123456|1234567|12345678|123456789|23|234|2345|23456|234567|2345678|23456789|34|345|3456|34567|345678|3456789|45|456|4567|45678|456789|56|567|5678|56789|67|678|6789|78|789|89/

Then I do:

$literal = '234121678941251236544567812122345678';
$matches = [];
preg_match_all($myregex,$literal,$matches);
var_dump($matches);

Result:

array(1) {
  [0]=>
  array(13) {
    [0]=>
    string(2) "23"
    [1]=>
    string(2) "12"
    [2]=>
    string(2) "67"
    [3]=>
    string(2) "89"
    [4]=>
    string(2) "12"
    [5]=>
    string(2) "12"
    [6]=>
    string(2) "45"
    [7]=>
    string(2) "67"
    [8]=>
    string(2) "12"
    [9]=>
    string(2) "12"
    [10]=>
    string(2) "23"
    [11]=>
    string(2) "45"
    [12]=>
    string(2) "67"
  }
}

However I want to find all substrings occuring (and not go to the next chars after a match) - like:

23,234,34,12,67,678,6789,78,789,89,12, ...

However I have tried different variatons with brackets, +, ... and did not figure out the correct regex to find all matches (sorry, still bit of a regex noob). How do I have to modify the regular expression?


回答1:


The order of the regex is important. I'm not sure if this fully solves the issue the method of doing it this way may be fundamentally flawed but you can try this:

$regex = [];

for($i=0;$i<10;$i++) {
    $str = "";
    for($a=0;$a<10;$a++) {
        if($a > $i) {
            $str .= $a;
            if(strlen($str)>1) {
              $regex[] = $str;
            }
        }
    }
}

usort($regex, function($a,$b){
    return strlen($b) <=> strlen($a);
});

$myregex = '/'.implode('|', $regex).'/';

What it does is make the number sequences an array, then it sorts them by length and orders them the longest sequences first. The end result is this (after matching)

array(1) {
  [0]=>
  array(9) {
    [0]=>
    string(3) "234"
    [1]=>
    string(2) "12"
    [2]=>
    string(4) "6789"
    [3]=>
    string(2) "12"
    [4]=>
    string(3) "123"
    [5]=>
    string(5) "45678"
    [6]=>
    string(2) "12"
    [7]=>
    string(2) "12"
    [8]=>
    string(7) "2345678"
  }
}

Also note the spaceship operator <=> only works in PHP7+

Hope it helps.

Sandbox

and not go to the next chars after a match

I don't think this is possible with regex, if you mean you want to find 23 234 2345 all at once in 2345607 for example. However if it matches a long sequence it only stands to reason that it must match a shorter one, logically. So you could just trim off the right hand number until the length is 2 and get the matches.



来源:https://stackoverflow.com/questions/52667735/php-preg-match-all-how-to-find-all-occurrences-of-or-seperated-substrings-w

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!