Calculating Hamming weight efficiently in matlab

后端 未结 9 1505
你的背包
你的背包 2020-12-09 11:46

Given a MATLAB uint32 to be interpreted as a bit string, what is an efficient and concise way of counting how many nonzero bits are in the string?

I have a working

相关标签:
9条回答
  • 2020-12-09 11:48

    Unless this is a MATLAB implementation exercise, you might want to just take your fast C++ implementation and compile it as a mex function, once per target platform.

    0 讨论(0)
  • 2020-12-09 11:56

    I'm reviving an old thread here, but I ran across this problem and I wrote this little bit of code for it:

    distance = sum(bitget(bits, 1:32));
    

    Looks pretty concise, but I'm scared that bitget is implemented in O(n) bitshift operations. The code works for what I'm going, but my problem set doesn't rely on hamming weight.

    0 讨论(0)
  • 2020-12-09 12:02

    I'd be interested to see how fast this solution is:

    function r = count_bits(n)
    
    shifts = [-1, -2, -4, -8, -16];
    masks = [1431655765, 858993459, 252645135, 16711935, 65535];
    
    r = n;
    for i=1:5
       r = bitand(bitshift(r, shifts(i)), masks(i)) + ...
          bitand(r, masks(i));
    end
    

    Going back, I see that this is the 'parallel' solution given on the bithacks page.

    0 讨论(0)
  • 2020-12-09 12:03

    Implemented the "Best 32 bit Algorithm" from the Stanford link at the top. The improved algorithm reduced processing time by 6%. Also optimized the segment size and found that 32K is stable and improves time by 15% over 4K. Expect 4Kx4K time to be 40% of Vectorized Scheiner Algorithm.

    function w = Ham(w)
    % Input uint32
    % Output vector of Ham wts
     for i=1:32768:length(w)
      w(i:i+32767)=Ham_seg(w(i:i+32767));
     end
    end
    
    % Segmentation gave reduced time by 50%
    
    function w=Ham_seg(w)
     %speed
     b1=uint32(1431655765); 
     b2=uint32(858993459);
     b3=uint32(252645135);
     b7=uint32(63); % working orig binary mask
    
     w = bitand(bitshift(w, -1), b1) + bitand(w, b1);
     w = bitand(bitshift(w, -2), b2) + bitand(w, b2);
     w =bitand(w+bitshift(w, -4),b3);
     w =bitand(bitshift(w,-24)+bitshift(w,-16)+bitshift(w,-8)+w,b7);
    
    end
    
    0 讨论(0)
  • 2020-12-09 12:04

    Did some timing comparisons on Matlab Cody. Determined a Segmented Modified Vectorized Scheiner gives optimimum performance.

    Have >50% time reduction based on Cody 1.30 sec to 0.60 sec change for an L=4096*4096 vector.

    function w = Ham(w)
    % Input uint32
    % Output vector of Ham wts
    
     b1=uint32(1431655765); % evaluating saves 15% of time 1.30 to 1.1 sec
     b2=uint32(858993459);
     b3=uint32(252645135);
     b4=uint32(16711935);
     b5=uint32(65535);
    
     for i=1:4096:length(w)
      w(i:i+4095)=Ham_seg(w(i:i+4095),b1,b2,b3,b4,b5);
     end
    end
    
    % Segmentation reduced time by 50%
    
    function w=Ham_seg(w,b1,b2,b3,b4,b5)
     % Passing variables or could evaluate b1:b5 here
    
    
     w = bitand(bitshift(w, -1), b1) + bitand(w, b1);
     w = bitand(bitshift(w, -2), b2) + bitand(w, b2);
     w = bitand(bitshift(w, -4), b3) + bitand(w, b3);
     w = bitand(bitshift(w, -8), b4) + bitand(w, b4);
     w = bitand(bitshift(w, -16), b5) + bitand(w, b5);
    
    end
    
    
    
    
    
    vt=randi(2^32,[4096*4096,1])-1;
    % for vt being uint32 the floor function gives unexpected values
    tic
    v=num_ones(mod(vt,65536)+1)+num_ones(floor(vt/65536)+1); % 0.85 sec
    toc
    % a corrected method is
    v=num_ones(mod(vt,65536)+1)+num_ones(floor(double(vt)/65536)+1);
    toc
    
    0 讨论(0)
  • 2020-12-09 12:07

    Try splitting the job into smaller parts. My guess is that if you want to process all data at once, matlab is trying to do each operation on all integers before taking successive steps and the processor's cache is invalidated with each step.

    for i=1:4096,
        «process bits(i,:)»
    end
    
    0 讨论(0)
提交回复
热议问题