How to write nested regex to find words below some string?

耗尽温柔 提交于 2020-02-02 13:14:06

问题


I am converting one pdf to text with xpdf and then find some words with help of regex and preg_match_all.

I am seperating my words with colon in pdftotext.

Below is my pdftotext output:

                                 In respect of Shareholders

Name:                                    xyx

Residential address:                     dublin

No of Shares:                            2

Name:                                    abc

Residential address:                     canada

No of Shares:                            2

So i write one regex that will show me words after colon in text().

$regex = '/(?<=: ).+/';
preg_match_all($regex, $string, $matches);

But Now i want regex that will display all data after In respect of Shareholders.

So, i write $regex = '/(?<=In respect of Shareholders).*?(?=\s)';

But it shows me only :

Name:                                    xyx

I want first to find all data after In respect of shareholders and then another regex to find words after colon.


回答1:


You may use

if (preg_match_all('~(?:\G(?!\A)|In respect of Shareholders)\s*[^:\r\n]+:\h*\K.*~', $string, $matches)) {
    print_r($matches[0]);
}

See the regex demo

Details

  • (?:\G(?!\A)|In respect of Shareholders) - either the end of the previous successful match or In respect of Shareholders text
  • \s* - 0+ whitespaces
  • [^:\n\r]+ - 1 or more chars other than :, CR and LF
  • : - a colon
  • \h* - 0+ horizontal whitespaces
  • \K - match reset operator that discards all text matched so far
  • .* - the rest of the line (0 or more chars other than line break chars).



回答2:


In your regex (?<=: ).+ you will match any character 1+ times after a colon and a space. To capture all that follows the spaces or tabs in a group, you could use (?<=: )[\t ](.+)

Another way to match the texts using a capturing group could be:

^.*?:[ \t]+(\w+)

Explanation

  • ^ Assert start of the string
  • .*?: Match any character non greedy followed by a :
  • [ \t]+ Match 1+ times a space or a tab
  • (\w+) Capture in a group 1+ word characters

Regex demo | Php demo

Or use \K to forget what was matched if that is supported:

^.*?:\h*\K\w+

Regex demo



来源:https://stackoverflow.com/questions/53572572/how-to-write-nested-regex-to-find-words-below-some-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!