How to know what word appears most in a paragraph? (Matlab)

橙三吉。 提交于 2019-12-05 18:32:43
X''

Here is a simple solution, should be quite fast.

example_paragraph = 'This is an example corpus. Is is a verb?';

words = regexp(example_paragraph, ' ', 'split');
vocabulary = unique(words);
n = length(vocabulary);
counts = zeros(n, 1);
for i=1:n
    counts(i) = sum(strcmpi(words, vocabulary{i}));
end

[frequency_of_the_most_frequent_word, idx] = max(counts);
most_frequent_word = vocabulary{idx};

You can also check out answers here for getting the most frequent word out of the array of words.

Here's a very MATLAB-y way to do it. I tried to name the variables clearly. Play with each line and examine the results to understand how it works. Workhorse functions: unique and hist

% First produce a cell array of words to be analyzed
paragraph_cleaned_up_whitespace = regexprep(paragraph, '\s', ' ');
paragraph_cleaned_up = regexprep(paragraph_cleaned_up_whitespace, '[^a-zA-Z0-9 ]', '');
words = regexpi(paragraph_cleaned_up, '\s+', 'split');

[unique_words, i, j] = unique(words);
frequency_count = hist(j, 1:max(j));
[~, sorted_locations] = sort(frequency_count);
sorted_locations = fliplr(sorted_locations);
words_sorted_by_frequency = unique_words(sorted_locations).';
frequency_of_those_words = frequency_count(sorted_locations).';
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!