How to parse column separated key-value text with possible multiline strings

怎甘沉沦 提交于 2019-12-30 07:22:41

问题


I need to parse the following text:

First: 1
Second: 2
Multiline: blablablabla
bla2bla2bla2
bla3b and key: value in the middle if strting
Fourth: value

Value is a string OR multiline string, at the same time value could contain "key: blablabla" substring. Such subsctring should be ignored (not parsed as a separate key-value pair).

Please help me with regex or other algorithm.

Ideal result would be:

$regex = "/SOME REGEX/";
$matches = [];
preg_match_all($regex, $html, $matches);
// $mathes has all key and value parsed pairs, including multilines values

Thank you.

I tried with simple regexes but result is incorrect, because I don't know how to handle multilines:

$regex = "/(.+?): (.+?)/";
$regex = "/(.+?):(.+?)\n/";
...

回答1:


You can do it with this pattern:

$pattern = '~(?<key>[^:\s]+): (?<value>(?>[^\n]*\R)*?[^\n]*)(?=\R\S+:|$)~';

preg_match_all($pattern, $txt, $matches, PREG_SET_ORDER);

print_r($matches);



回答2:


You can sort of do it, as long as you consider a single word followed by a colon at the start of a line to be a new key start:

$data = 'First: 1
Second: 2
Multiline: blablablabla
bla2bla2bla2
bla3b and key: value in the middle if strting
Fourth: value';

preg_match_all('/^([a-z]+): (.*?)(?=(^[a-z]+:|\z))/ims', $data, $matches);

var_dump($matches);

This gives the following result:

array(4) {
  [0]=>
  array(4) {
    [0]=>
    string(10) "First: 1
"
    [1]=>
    string(11) "Second: 2
"
    [2]=>
    string(86) "Multiline: blablablabla
bla2bla2bla2
bla3b and key: value in the middle if strting
"
    [3]=>
    string(13) "Fourth: value"
  }
  [1]=>
  array(4) {
    [0]=>
    string(5) "First"
    [1]=>
    string(6) "Second"
    [2]=>
    string(9) "Multiline"
    [3]=>
    string(6) "Fourth"
  }
  [2]=>
  array(4) {
    [0]=>
    string(3) "1
"
    [1]=>
    string(3) "2
"
    [2]=>
    string(75) "blablablabla
bla2bla2bla2
bla3b and key: value in the middle if strting
"
    [3]=>
    string(5) "value"
  }
  [3]=>
  array(4) {
    [0]=>
    string(7) "Second:"
    [1]=>
    string(10) "Multiline:"
    [2]=>
    string(7) "Fourth:"
    [3]=>
    string(0) ""
  }
}


来源:https://stackoverflow.com/questions/23012301/how-to-parse-column-separated-key-value-text-with-possible-multiline-strings

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!