Finding repeated words in PHP without specifying the word itself

本小妞迷上赌 提交于 2019-12-01 06:37:41

问题


I've been thinking about something for a project I want to do, I'm not an advance user and I'm just learning. Do not know if this is possible:

Suppose we have 100 html documents containing many tables and text inside them.

Question one is: is it possible to analyze all this text and find words repeated and count it?.

Yes, It's possible to do with some functions but here's the problem: what if we did not know the words that will gonna find? That is, we would have to tell the code what a word means.

Suppose, for example, that one word would be a union of seven characters, the idea would be to find other similar patterns and mention it. What would be the best way to do this?

Thank you very much in advance.

Example:

Search: Five characters patterns on the next phrases:

Text one:

"It takes an ocean not to break"

Text two:

"An ocean is a body of saline water"

Result

Takes 1 
Break 1
water 1
Ocean 2

Thanks in advance for your help.


回答1:


function get_word_counts($phrases) {
   $counts = array();
    foreach ($phrases as $phrase) {
        $words = explode(' ', $phrase);
        foreach ($words as $word) {
          $word = preg_replace("#[^a-zA-Z\-]#", "", $word);
            $counts[$word] += 1;
        }
    }
    return $counts;
}

$phrases = array("It takes an ocean of water not to break!", "An ocean is a body of saline water, or so I am told.");

$counts = get_word_counts($phrases);
arsort($counts);
print_r($counts);

OUTPUT

Array
(
    [of] => 2
    [ocean] => 2
    [water] => 2
    [or] => 1
    [saline] => 1
    [body] => 1
    [so] => 1
    [I] => 1
    [told] => 1
    [a] => 1
    [am] => 1
    [An] => 1
    [an] => 1
    [takes] => 1
    [not] => 1
    [to] => 1
    [It] => 1
    [break] => 1
    [is] => 1
)

EDIT
Updated to deal with basic punctuation, based on @Jack's comment.



来源:https://stackoverflow.com/questions/14035945/finding-repeated-words-in-php-without-specifying-the-word-itself

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!