Most used words in text with php

霸气de小男生 提交于 2019-11-30 16:04:39

This is a function that extract common words from a string. it takes three parameters; string, stop words array and keywords count. you have to get the stop_words from txt file using php function that take txt file into array

$stop_words = file('stop_words.txt', FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);

$this->extract_common_words( $text, $stop_words)

You can use this file stop_words.txt as your primary stop words file, or create your own file.

function extract_common_words($string, $stop_words, $max_count = 5) {
      $string = preg_replace('/ss+/i', '', $string);
      $string = trim($string); // trim the string
      $string = preg_replace('/[^a-zA-Z -]/', '', $string); // only take alphabet characters, but keep the spaces and dashes too…
      $string = strtolower($string); // make it lowercase

      preg_match_all('/\b.*?\b/i', $string, $match_words);
      $match_words = $match_words[0];

      foreach ( $match_words as $key => $item ) {
          if ( $item == '' || in_array(strtolower($item), $stop_words) || strlen($item) <= 3 ) {
              unset($match_words[$key]);
          }
      }  

      $word_count = str_word_count( implode(" ", $match_words) , 1); 
      $frequency = array_count_values($word_count);
      arsort($frequency);

      //arsort($word_count_arr);
      $keywords = array_slice($frequency, 0, $max_count);
      return $keywords;
}

There's not additional parameters or a native PHP function that you can pass words to exclude. As such, I would just use what you have and ignore a custom set of words returned by str_word_count.

You can do this easily by using array_diff():

$words = array("if", "you", "do", "this", 'I', 'do', 'that');
$stopwords = array("a", "you", "if");

print_r(array_diff($words, $stopwords));

gives

 Array
(
    [2] => do
    [3] => this
    [4] => I
    [5] => do
    [6] => that
)

But you have to take care of lower and upper case yourself. The easiest way here would be to convert the text to lowercase beforehand.

Here is my solution by using the built-in PHP functions:

most_frequent_words — Find most frequent word(s) appeared in a String

function most_frequent_words($string, $stop_words = [], $limit = 5) {
    $string = strtolower($string); // Make string lowercase

    $words = str_word_count($string, 1); // Returns an array containing all the words found inside the string
    $words = array_diff($words, $stop_words); // Remove black-list words from the array
    $words = array_count_values($words); // Count the number of occurrence

    arsort($words); // Sort based on count

    return array_slice($words, 0, $limit); // Limit the number of words and returns the word array
}

Returns array contains word(s) appeared most frequently in the string.

Parameters :

string $string - The input string.

array $stop_words (optional) - List of words which are filtered out from the array, Default empty array.

string $limit (optional) - Limit the number of words returned, Default 5.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!