Given a MATLAB uint32 to be interpreted as a bit string, what is an efficient and concise way of counting how many nonzero bits are in the string?
I have a working
Unless this is a MATLAB implementation exercise, you might want to just take your fast C++ implementation and compile it as a mex function, once per target platform.
I'm reviving an old thread here, but I ran across this problem and I wrote this little bit of code for it:
distance = sum(bitget(bits, 1:32));
Looks pretty concise, but I'm scared that bitget
is implemented in O(n) bitshift
operations. The code works for what I'm going, but my problem set doesn't rely on hamming weight.
I'd be interested to see how fast this solution is:
function r = count_bits(n)
shifts = [-1, -2, -4, -8, -16];
masks = [1431655765, 858993459, 252645135, 16711935, 65535];
r = n;
for i=1:5
r = bitand(bitshift(r, shifts(i)), masks(i)) + ...
bitand(r, masks(i));
end
Going back, I see that this is the 'parallel' solution given on the bithacks page.
Implemented the "Best 32 bit Algorithm" from the Stanford link at the top. The improved algorithm reduced processing time by 6%. Also optimized the segment size and found that 32K is stable and improves time by 15% over 4K. Expect 4Kx4K time to be 40% of Vectorized Scheiner Algorithm.
function w = Ham(w)
% Input uint32
% Output vector of Ham wts
for i=1:32768:length(w)
w(i:i+32767)=Ham_seg(w(i:i+32767));
end
end
% Segmentation gave reduced time by 50%
function w=Ham_seg(w)
%speed
b1=uint32(1431655765);
b2=uint32(858993459);
b3=uint32(252645135);
b7=uint32(63); % working orig binary mask
w = bitand(bitshift(w, -1), b1) + bitand(w, b1);
w = bitand(bitshift(w, -2), b2) + bitand(w, b2);
w =bitand(w+bitshift(w, -4),b3);
w =bitand(bitshift(w,-24)+bitshift(w,-16)+bitshift(w,-8)+w,b7);
end
Did some timing comparisons on Matlab Cody. Determined a Segmented Modified Vectorized Scheiner gives optimimum performance.
Have >50% time reduction based on Cody 1.30 sec to 0.60 sec change for an L=4096*4096 vector.
function w = Ham(w)
% Input uint32
% Output vector of Ham wts
b1=uint32(1431655765); % evaluating saves 15% of time 1.30 to 1.1 sec
b2=uint32(858993459);
b3=uint32(252645135);
b4=uint32(16711935);
b5=uint32(65535);
for i=1:4096:length(w)
w(i:i+4095)=Ham_seg(w(i:i+4095),b1,b2,b3,b4,b5);
end
end
% Segmentation reduced time by 50%
function w=Ham_seg(w,b1,b2,b3,b4,b5)
% Passing variables or could evaluate b1:b5 here
w = bitand(bitshift(w, -1), b1) + bitand(w, b1);
w = bitand(bitshift(w, -2), b2) + bitand(w, b2);
w = bitand(bitshift(w, -4), b3) + bitand(w, b3);
w = bitand(bitshift(w, -8), b4) + bitand(w, b4);
w = bitand(bitshift(w, -16), b5) + bitand(w, b5);
end
vt=randi(2^32,[4096*4096,1])-1;
% for vt being uint32 the floor function gives unexpected values
tic
v=num_ones(mod(vt,65536)+1)+num_ones(floor(vt/65536)+1); % 0.85 sec
toc
% a corrected method is
v=num_ones(mod(vt,65536)+1)+num_ones(floor(double(vt)/65536)+1);
toc
Try splitting the job into smaller parts. My guess is that if you want to process all data at once, matlab is trying to do each operation on all integers before taking successive steps and the processor's cache is invalidated with each step.
for i=1:4096,
«process bits(i,:)»
end