How to know what word appears most in a paragraph? (Matlab)

牧云@^-^@ 提交于 2019-12-07 12:53:39

问题


I have a huge paragraph and want to know what word appears most in it. Could anyone please point me in the right direction with this? Any examples and explanations would be helpful. Thanks!


回答1:


Here is a simple solution, should be quite fast.

example_paragraph = 'This is an example corpus. Is is a verb?';

words = regexp(example_paragraph, ' ', 'split');
vocabulary = unique(words);
n = length(vocabulary);
counts = zeros(n, 1);
for i=1:n
    counts(i) = sum(strcmpi(words, vocabulary{i}));
end

[frequency_of_the_most_frequent_word, idx] = max(counts);
most_frequent_word = vocabulary{idx};

You can also check out answers here for getting the most frequent word out of the array of words.




回答2:


Here's a very MATLAB-y way to do it. I tried to name the variables clearly. Play with each line and examine the results to understand how it works. Workhorse functions: unique and hist

% First produce a cell array of words to be analyzed
paragraph_cleaned_up_whitespace = regexprep(paragraph, '\s', ' ');
paragraph_cleaned_up = regexprep(paragraph_cleaned_up_whitespace, '[^a-zA-Z0-9 ]', '');
words = regexpi(paragraph_cleaned_up, '\s+', 'split');

[unique_words, i, j] = unique(words);
frequency_count = hist(j, 1:max(j));
[~, sorted_locations] = sort(frequency_count);
sorted_locations = fliplr(sorted_locations);
words_sorted_by_frequency = unique_words(sorted_locations).';
frequency_of_those_words = frequency_count(sorted_locations).';


来源:https://stackoverflow.com/questions/13592390/how-to-know-what-word-appears-most-in-a-paragraph-matlab

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!