Convert binary (0|1) numpy to integer or binary-string?

前端未结

关注

 5  1204

Is there a shortcut to Convert binary (0|1) numpy array to integer or binary-string ? F.e.

b = np.array([0,0,0,0,0,1,0,1])   
  => b is 5

np.packbits(b)


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  北荒        
                
              
                            
                2021-02-19 20:43
              
            
            
                                                                       
One way would be using dot-product with 2-powered range array -

b.dot(2**np.arange(b.size)[::-1])


Sample run -

In [95]: b = np.array([1,0,1,0,0,0,0,0,1,0,1])

In [96]: b.dot(2**np.arange(b.size)[::-1])
Out[96]: 1285


Alternatively, we could use bitwise left-shift operator to create the range array and thus get the desired output, like so -

b.dot(1 << np.arange(b.size)[::-1])


If timings are of interest -

In [148]: b = np.random.randint(0,2,(50))

In [149]: %timeit b.dot(2**np.arange(b.size)[::-1])
100000 loops, best of 3: 13.1 µs per loop

In [150]: %timeit b.dot(1 << np.arange(b.size)[::-1])
100000 loops, best of 3: 7.92 µs per loop




Reverse process

To retrieve back the binary array, use np.binary_repr alongwith np.fromstring -

In [96]: b = np.array([1,0,1,0,0,0,0,0,1,0,1])

In [97]: num = b.dot(2**np.arange(b.size)[::-1]) # integer

In [98]: np.fromstring(np.binary_repr(num), dtype='S1').astype(int)
Out[98]: array([1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1])

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  别那么骄傲        
                
              
                            
                2021-02-19 20:50
              
            
            
                                                                       
I extended the good dot product solution of @Divikar to run ~180x faster on my host, by using vectorized matrix multiplication code. The original code that runs one-row-at-a-time took ~3 minutes to run 100K rows of 18 columns in my pandas dataframe. Well, next week I need to upgrade from 100K rows to 20M rows, so ~10 hours of running time was not going to be fast enough for me. The new code is vectorized, first of all. That's the real change in the python code.  Secondly, matmult often runs in parallel without you seeing it, on many-core processors depending on your host configuration, especially when OpenBLAS or other BLAS is present for numpy to use on matrix algebra like this matmult. So it can use a lot of processors and cores, if you have it.

The new -- quite simple -- code runs 100K rows x 18 binary columns in ~1 sec ET on my host which is "mission accomplished" for me:

'''
Fast way is vectorized matmult. Pass in all rows and cols in one shot.
'''
def BitsToIntAFast(bits):
  m,n = bits.shape # number of columns is needed, not bits.size
  a = 2**np.arange(n)[::-1]  # -1 reverses array of powers of 2 of same length as bits
  return bits @ a  # this matmult is the key line of code

'''I use it like this:'''
bits = d.iloc[:,4:(4+18)] # read bits from my pandas dataframe
gs = BitsToIntAFast(bits)
print(gs[:5])
gs.shape
...
d['genre'] = np.array(gs)  # add the newly computed column to pandas


Hope this helps.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  没有蜡笔的小新        
                
              
                            
                2021-02-19 20:52
              
            
            
                                                                       
def binary_converter(arr):
    total = 0
    for index, val in enumerate(reversed(arr)):
        total += (val * 2**index)
    print total


In [14]: b = np.array([1,0,1,0,0,0,0,0,1,0,1])
In [15]: binary_converter(b)
1285
In [9]: b = np.array([0,0,0,0,0,1,0,1])
In [10]: binary_converter(b)
5


or

b = np.array([1,0,1,0,0,0,0,0,1,0,1])
sum(val * 2**index for index, val in enumerate(reversed(b)))

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  太阳男子        
                
              
                            
                2021-02-19 20:58
              
            
            
                                                                       
Using numpy for conversion limits you to 64-bit signed binary results. If you really want to use numpy and the 64-bit limit works for you a faster implementation using numpy is:

import numpy as np
def bin2int(bits):
    return np.right_shift(np.packbits(bits, -1), bits.size).squeeze()


Since normally if you are using numpy you care about speed then the fastest implementation for > 64-bit results is:

import gmpy2
def bin2int(bits):
    return gmpy2.pack(list(bits[::-1]), 1)


If you don't want to grab a dependency on gmpy2 this is a little slower but has no dependencies and supports > 64-bit results:

def bin2int(bits):
    total = 0
    for shift, j in enumerate(bits[::-1]):
        if j:
            total += 1 << shift
    return total


The observant will note some similarities in the last version to other Answers to this question with the main difference being the use of the << operator instead of **, in my testing this led to a significant improvement in speed.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  终归单人心        
                
              
                            
                2021-02-19 20:59
              
            
            
                                                                       
My timeit results:
b.dot(2**np.arange(b.size)[::-1])
100000 loops, best of 3: 2.48 usec per loop

b.dot(1 << np.arange(b.size)[::-1])
100000 loops, best of 3: 2.24 usec per loop

# Precompute powers-of-2 array with a = 1 << np.arange(b.size)[::-1]
b.dot(a)
100000 loops, best of 3: 0.553 usec per loop

# using gmpy2 is slower
gmpy2.pack(list(map(int,b[::-1])), 1)
100000 loops, best of 3: 10.6 usec per loop

So if you know the size ahead of time, it's significantly faster to precompute the powers-of-2 array. But if possible, you should do all computations simultaneously using matrix multiplication like in Geoffrey Anderson's answer.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复