Get more backreferences from regexp than parenthesis

 ̄綄美尐妖づ 提交于 2020-01-03 02:22:14

问题


Ok this is really difficult to explain in English, so I'll just give an example.

I am going to have strings in the following format:

key-value;key1-value;key2-...

and I need to extract the data to be an array

array('key'=>'value','key1'=>'value1', ... )

I was planning to use regexp to achieve (most of) this functionality, and wrote this regular expression:

/^(\w+)-([^-;]+)(?:;(\w+)-([^-;]+))*;?$/

to work with preg_match and this code:

for ($l = count($matches),$i = 1;$i<$l;$i+=2) {
    $parameters[$matches[$i]] = $matches[$i+1];
}

However the regexp obviously returns only 4 backreferences - first and last key-value pairs of the input string. Is there a way around this? I know I can use regex just to test the correctness of the string and use PHP's explode in loops with perfect results, but I'm really curious whether it's possible with regular expressions.

In short, I need to capture an arbitrary number of these key-value; pairs in a string by means of regular expressions.


回答1:


You can use a lookahead to validate the input while you extract the matches:

/\G(?=(?:\w++-[^;-]++;?)++$)(\w++)-([^;-]++);?/

(?=(?:\w++-[^;-]++;?)++$) is the validation part. If the input is invalid, matching will fail immediately, but the lookahead still gets evaluated every time the regex is applied. In order to keep it (along with the rest of the regex) in sync with the key-value pairs, I used \G to anchor each match to the spot where the previous match ended.

This way, if the lookahead succeeds the first time, it's guaranteed to succeed every subsequent time. Obviously it's not as efficient as it could be, but that probably won't be a problem--only your testing can tell for sure.

If the lookahead fails, preg_match_all() will return zero (false). If it succeeds, the matches will be returned in an array of arrays: one for the full key-value pairs, one for the keys, one for the values.




回答2:


regex is powerful tool, but sometimes, its not the best approach.

$string = "key-value;key1-value";
$s = explode(";",$string);
foreach($s as $k){
    $e = explode("-",$k);
    $array[$e[0]]=$e[1];
}
print_r($array);



回答3:


Use preg_match_all() instead. Maybe something like:

$matches = $parameters = array();
$input = 'key-value;key1-value1;key2-value2;key123-value123;';

preg_match_all("/(\w+)-([^-;]+)/", $input, $matches, PREG_SET_ORDER);

foreach ($matches as $match) {
   $parameters[$match[1]] = $match[2];
}

print_r($parameters);

EDIT:

to first validate if the input string conforms to the pattern, then just use:

if (preg_match("/^((\w+)-([^-;]+);)+$/", $input) > 0) {
    /* do the preg_match_all stuff */
}       

EDIT2: the final semicolon is optional

if (preg_match("/^(\w+-[^-;]+;)*\w+-[^-;]+$/", $input) > 0) {
    /* do the preg_match_all stuff */
}       



回答4:


No. Newer matches overwrite older matches. Perhaps the limit argument of explode() would be helpful when exploding.




回答5:


what about this solution:

$samples = array(
    "good" => "key-value;key1-value;key2-value;key5-value;key-value;",
    "bad1" => "key-value-value;key1-value;key2-value;key5-value;key-value;",
    "bad2" => "key;key1-value;key2-value;key5-value;key-value;",
    "bad3" => "k%ey;key1-value;key2-value;key5-value;key-value;"
);

foreach($samples as $name => $value) {
    if (preg_match("/^(\w+-\w+;)+$/", $value)) {
        printf("'%s' matches\n", $name);
    } else {
        printf("'%s' not matches\n", $name);
    }
}



回答6:


I don't think you can do both validation and extraction of data with one single regexp, as you need anchors (^ and $) for validation and preg_match_all() for the data, but if you use anchors with preg_match_all() it will only return the last set matched.



来源:https://stackoverflow.com/questions/2245334/get-more-backreferences-from-regexp-than-parenthesis

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!