Convert binary (0|1) numpy to integer or binary-string?

前端 未结 5 1195
渐次进展
渐次进展 2021-02-19 20:29

Is there a shortcut to Convert binary (0|1) numpy array to integer or binary-string ? F.e.

b = np.array([0,0,0,0,0,1,0,1])   
  => b is 5

np.packbits(b)
         


        
5条回答
  •  别那么骄傲
    2021-02-19 20:50

    I extended the good dot product solution of @Divikar to run ~180x faster on my host, by using vectorized matrix multiplication code. The original code that runs one-row-at-a-time took ~3 minutes to run 100K rows of 18 columns in my pandas dataframe. Well, next week I need to upgrade from 100K rows to 20M rows, so ~10 hours of running time was not going to be fast enough for me. The new code is vectorized, first of all. That's the real change in the python code. Secondly, matmult often runs in parallel without you seeing it, on many-core processors depending on your host configuration, especially when OpenBLAS or other BLAS is present for numpy to use on matrix algebra like this matmult. So it can use a lot of processors and cores, if you have it.

    The new -- quite simple -- code runs 100K rows x 18 binary columns in ~1 sec ET on my host which is "mission accomplished" for me:

    '''
    Fast way is vectorized matmult. Pass in all rows and cols in one shot.
    '''
    def BitsToIntAFast(bits):
      m,n = bits.shape # number of columns is needed, not bits.size
      a = 2**np.arange(n)[::-1]  # -1 reverses array of powers of 2 of same length as bits
      return bits @ a  # this matmult is the key line of code
    
    '''I use it like this:'''
    bits = d.iloc[:,4:(4+18)] # read bits from my pandas dataframe
    gs = BitsToIntAFast(bits)
    print(gs[:5])
    gs.shape
    ...
    d['genre'] = np.array(gs)  # add the newly computed column to pandas
    

    Hope this helps.

提交回复
热议问题