问题
I have 400 files, each one contains about 500000 character, and those 500000 characters consists only from about 20 letters. I want to make a histogram indicating the most 10 letters used (x-axis) and number of times each letter is used (y-axis). I wrote this code which has missing thing which is I want to know each bar is corresponding to which letter. What should I add on the code ? You can change the whole code, but keeping this is better for me. provide me the whole code so I can copy it directly to a script and run it.
i = 1;
z = zeros(1, 10);
for i=1:400
j = num2str(i);
file_name = strcat('part',j,'txt');
file_id = fopen(file_name);
part = fread(file_id, inf, 'uchar');
h = hist(part,10);
z = z + h;
fclose(file_id);
end
回答1:
First of all, your use of hist
is wrong. hist(data,10)
will create a histogram from data that consists of 10 bins, so a bin will correspond to more than one character in your files.
A way to solve this would be to use hist
on predefined bins like:
bins = 1:255; % define the bins for hist
histSum = zeros(numel(bins),1);
for file=1:10;
data = randi(25,100) + 'a'; %Generate random data - letters between 'a' and 'z'
data = reshape(T,numel(T),1); % Make it a vector
histSum = histSum + hist(data,bins)';
end
Note that you have to define your bins to accommodate all possible values, therefore ranging from 1 to 255
来源:https://stackoverflow.com/questions/30812003/matlab-each-bar-in-histogram-correspond-to-which-letter