问题
I have a regex for cut a string in a way to contain a keyword (the keyword is based on a pattern, like {query:ABCD:1234}), 5 words before the keyword and 5 words after it. Then before and after that keyword, I am going to show three dots, like:
Lorem ipsum dolor sit amet, consectetur {query:ABCD:1234} adipiscing elit. Mauris consequat, quam id feugiat varius.
And I expect:
... ipsum dolor sit amet, consectetur {query:ABCD:1234} adipiscing elit. Mauris consequat, quam ...
Here is the regex:
preg_match("/((?:\w+\W+){5})" . preg_quote($keyword, "/") . "((?:\W+\w+){5})/", $text, $matches);
The issue is when the final word attached to a dot/question mark/exclamation mark, this regex does not work, like:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Mauris consequat, quam id feugiat varius {query:ABCD:1234}.
I expect
... quam id feugiat varius {query:ABCD:1234}.
But it returns:
... quam id feugiat varius {query:ABCD:1234}
(No dot at the end).
Same thing when the last word is not the keyword:
Original: {query:ABCD:1234} Lorem ipsum dolor sit amet!
Returns: {query:ABCD:1234} Lorem ipsum dolor sit amet ...
Expected: {query:ABCD:1234} Lorem ipsum dolor sit amet!
How this can be fixed?
Update:
Here is my code:
function cutMessage($text, $search)
{
$pieces = explode(' ', $text);
$firstWord = $pieces[0];
$lastWord = array_pop($pieces);
preg_match("/((?:\w+\W+){0,5})" . preg_quote($search, "/") . "((?:\W+\w+){0,5})/", $text, $matches);
$returnText = '';
$pieces = explode(' ', $matches[1]);
if (!empty($matches[1]) && $pieces[0] != $firstWord) {
$returnText .= '... ' . $matches[1];
} elseif (!empty($matches[1])) {
$returnText .= $matches[1];
}
$returnText .= $search;
$pieces = explode(' ', $matches[2]);
if (!empty($matches[2]) && array_pop($pieces) != $lastWord) {
$returnText .= $matches[2] . ' ...';
} elseif (!empty($matches[2])) {
$returnText .= $matches[2];
}
return $returnText;
}
回答1:
If you echo your current pattern with the example keyword, this part at the end (?:\W+\w+){0,5}
does not match a comma or exclamation mark because the \w+
matches 1 or more word characters.
((?:\w+\W+){0,5})\{query\:ABCD\:1234\}((?:\W+\w+){0,5})
^^
One option is to match 0+ times any non word characters that you would allow to match in a third capturing group ([!.]?)
((?:\w+\W+){0,5})\{query\:ABCD\:1234\}((?:\W+\w+){0,5})([!.]?)
^^^^^^^
As you are checking if the values of the captured groups are not empty, you might add another check for the third capturing group.
If that group is not empty, then concatenate group 2 and group 3.
if (!empty($matches[3])) {
$returnText .= $matches[2] . $matches[3];
} elseif (!empty($matches[2]) && array_pop($pieces) != $lastWord) {
$returnText .= $matches[2] . ' ...';
} elseif (!empty($matches[2])) {
$returnText .= $matches[2];
}
return $returnText;
Regex demo | Php demo
来源:https://stackoverflow.com/questions/59002335/preg-match-considers-special-characters-as-a-separate-word