How to write nested regex to find words below some string?

问题

I am converting one pdf to text with xpdf and then find some words with help of regex and preg_match_all.

I am seperating my words with colon in pdftotext.

Below is my pdftotext output:

                                 In respect of Shareholders

Name:                                    xyx

Residential address:                     dublin

No of Shares:                            2

Name:                                    abc

Residential address:                     canada

No of Shares:                            2

So i write one regex that will show me words after colon in text().

$regex = '/(?<=: ).+/';
preg_match_all($regex, $string, $matches);

But Now i want regex that will display all data after In respect of Shareholders.

So, i write $regex = '/(?<=In respect of Shareholders).*?(?=\s)';

But it shows me only :

Name:                                    xyx

I want first to find all data after In respect of shareholders and then another regex to find words after colon.

回答1:

You may use

if (preg_match_all('~(?:\G(?!\A)|In respect of Shareholders)\s*[^:\r\n]+:\h*\K.*~', $string, $matches)) {
    print_r($matches[0]);
}

See the regex demo

Details

(?:\G(?!\A)|In respect of Shareholders) - either the end of the previous successful match or In respect of Shareholders text
\s* - 0+ whitespaces
[^:\n\r]+ - 1 or more chars other than :, CR and LF
: - a colon
\h* - 0+ horizontal whitespaces
\K - match reset operator that discards all text matched so far
.* - the rest of the line (0 or more chars other than line break chars).

回答2:

In your regex (?<=: ).+ you will match any character 1+ times after a colon and a space. To capture all that follows the spaces or tabs in a group, you could use (?<=: )[\t ](.+)

Another way to match the texts using a capturing group could be:

^.*?:[ \t]+(\w+)

Explanation

^ Assert start of the string
.*?: Match any character non greedy followed by a :
[ \t]+ Match 1+ times a space or a tab
(\w+) Capture in a group 1+ word characters

Regex demo | Php demo

Or use \K to forget what was matched if that is supported:

^.*?:\h*\K\w+

Regex demo

来源：https://stackoverflow.com/questions/53572572/how-to-write-nested-regex-to-find-words-below-some-string

标签

regex

preg-match-all