PHP regex for a word collection around a search phrase

后端 未结 2 1243
忘掉有多难
忘掉有多难 2021-01-06 07:08

Hi I am trying to create a regex that will do the following

grab 5 words before the search phrase (or x if there is only x words there) and 5 words after the search

相关标签:
2条回答
  • 2021-01-06 07:43

    You can do the folowing (it is a bit computation heavy, so it woudn't be efficient for very long strings):

    <?php
    $phrase = "Welcome to Stack Overflow! Visit your user page to set your name and email.";
    $keyword = "Visit";
    $lcWords = preg_split("/\s/", strtolower($phrase));
    $words = preg_split("/\s/", $phrase);
    $wordCount = 5;
    
    $position = array_search(strtolower($keyword), $lcWords);
    $indexBegin =  max(array($position - $wordCount, 0));
    $len = min(array(count($words), $position - $indexBegin + $wordCount + 1));
    echo join(" ", array_slice($words, $indexBegin, $len));
    //prints: Welcome to Stack Overflow! Visit your user page to set
    

    Codepad example here

    0 讨论(0)
  • 2021-01-06 07:49

    How about this:

    (\S+\s+){0,5}\S*\bvisit\b\S*(\s+\S+){0,5}
    

    will match five "words" (but accepting less if the text is shorter) before and after your search word (in this case visit).

    preg_match_all(
        '/(\S+\s+){0,5} # Match five (or less) "words"
        \S*             # Match (if present) punctuation before the search term
        \b              # Assert position at the start of a word
        visit           # Match the search term
        \b              # Assert position at the end of a word
        \S*             # Match (if present) punctuation after the search term
        (\s+\S+){0,5}   # Match five (or less) "words"
        /ix', 
        $subject, $result, PREG_PATTERN_ORDER);
    $result = $result[0];
    

    I'm defining a "word" as a sequence of non-whitespace characters, separated by at least one whitespace.

    The search words should be actual words (starting and ending with an alphanumeric character).

    0 讨论(0)
提交回复
热议问题