So I\'m writing a k-means script in MATLAB, since the native function doesn\'t seem to be very efficient, and it seems to be fully operational. It appears to work on the sma
Profiling will help, but the place to rework your code is to avoid the loop over the number of data points (for point = 1:size(data,1)
). Vectorize that.
In your for iteration
loop here is a quick partial example,
[nPoints,nDims] = size(data);
% Calculate all high-dimensional distances at once
kdiffs = bsxfun(@minus,data,permute(mu_k,[3 2 1])); % NxDx1 - 1xDxK => NxDxK
distances = sum(kdiffs.^2,2); % no need to do sqrt
distances = squeeze(distances); % Nx1xK => NxK
% Find closest cluster center for each point
[~,ik] = min(distances,[],2); % Nx1
% Calculate the new cluster centers (mean the data)
mu_k_new = zeros(c,nDims);
for i=1:c,
indk = ik==i;
clustersizes(i) = nnz(indk);
mu_k_new(i,:) = mean(data(indk,:))';
end
This isn't the only (or the best) way to do it, but it should be a decent example.
Some other comments:
input
, make this script into a function to efficiently handle input arguments.uigetfile
.max
, min
, sum
, mean
, etc., you can specify a dimension over which the function should operate. This way you an run it on a matrix and compute values for multiple conditions/dimensions at the same time.ik
, will be the same with squared Euclidean distance.