问题
I am converting one pdf to text with xpdf and then find some words with help of regex and preg_match_all.
I am seperating my words with colon in pdftotext.
Below is my pdftotext output:
In respect of Shareholders
Name: xyx
Residential address: dublin
No of Shares: 2
Name: abc
Residential address: canada
No of Shares: 2
So i write one regex that will show me words after colon in text().
$regex = '/(?<=: ).+/';
preg_match_all($regex, $string, $matches);
But Now i want regex that will display all data after In respect of Shareholders
.
So, i write $regex = '/(?<=In respect of Shareholders).*?(?=\s)';
But it shows me only :
Name: xyx
I want first to find all data after In respect of shareholders
and then another regex to find words after colon.
回答1:
You may use
if (preg_match_all('~(?:\G(?!\A)|In respect of Shareholders)\s*[^:\r\n]+:\h*\K.*~', $string, $matches)) {
print_r($matches[0]);
}
See the regex demo
Details
(?:\G(?!\A)|In respect of Shareholders)
- either the end of the previous successful match orIn respect of Shareholders
text\s*
- 0+ whitespaces[^:\n\r]+
- 1 or more chars other than:
, CR and LF:
- a colon\h*
- 0+ horizontal whitespaces\K
- match reset operator that discards all text matched so far.*
- the rest of the line (0 or more chars other than line break chars).
回答2:
In your regex (?<=: ).+
you will match any character 1+ times after a colon and a space. To capture all that follows the spaces or tabs in a group, you could use (?<=: )[\t ](.+)
Another way to match the texts using a capturing group could be:
^.*?:[ \t]+(\w+)
Explanation
^
Assert start of the string.*?:
Match any character non greedy followed by a:
[ \t]+
Match 1+ times a space or a tab(\w+)
Capture in a group 1+ word characters
Regex demo | Php demo
Or use \K
to forget what was matched if that is supported:
^.*?:\h*\K\w+
Regex demo
来源:https://stackoverflow.com/questions/53572572/how-to-write-nested-regex-to-find-words-below-some-string