Hi I am trying to create a regex that will do the following
grab 5 words before the search phrase (or x if there is only x words there) and 5 words after the search
You can do the folowing (it is a bit computation heavy, so it woudn't be efficient for very long strings):
<?php
$phrase = "Welcome to Stack Overflow! Visit your user page to set your name and email.";
$keyword = "Visit";
$lcWords = preg_split("/\s/", strtolower($phrase));
$words = preg_split("/\s/", $phrase);
$wordCount = 5;
$position = array_search(strtolower($keyword), $lcWords);
$indexBegin = max(array($position - $wordCount, 0));
$len = min(array(count($words), $position - $indexBegin + $wordCount + 1));
echo join(" ", array_slice($words, $indexBegin, $len));
//prints: Welcome to Stack Overflow! Visit your user page to set
Codepad example here
How about this:
(\S+\s+){0,5}\S*\bvisit\b\S*(\s+\S+){0,5}
will match five "words" (but accepting less if the text is shorter) before and after your search word (in this case visit
).
preg_match_all(
'/(\S+\s+){0,5} # Match five (or less) "words"
\S* # Match (if present) punctuation before the search term
\b # Assert position at the start of a word
visit # Match the search term
\b # Assert position at the end of a word
\S* # Match (if present) punctuation after the search term
(\s+\S+){0,5} # Match five (or less) "words"
/ix',
$subject, $result, PREG_PATTERN_ORDER);
$result = $result[0];
I'm defining a "word" as a sequence of non-whitespace characters, separated by at least one whitespace.
The search words should be actual words (starting and ending with an alphanumeric character).