Python representation of negative integers

后端未结

关注

 1  999

>>> x = -4
>>> print(\"{} {:b}\".format(x, x))
-4 -100
>>> mask = 0xFFFFFFFF
>>> print(\"{} {:b}\".format(x & mask, x & m


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  天命终不由人        
                
              
                            
                2020-12-31 09:59
              
            
            
                                                                       
TLDR; CPython integer type stores the sign in a specific field of a structure. When performing a bitwise operation, CPython replaces negative numbers by their two's complement and sometimes (!) performs the reverse operation (ie replace the two's complements by negative numbers).

Bitwise operations

The internal representation of an integer is a PyLongObject struct, that contains a PyVarObject struct. (When CPython creates a new PyLong object, it allocates the memory for the structure and a trailing space for the digits.) What matter here is that the PyLong is sized: the ob_size field of the PyVarObject embedded struct contains the size (in digits) of the integer (digits are either 15 or 30 bits digits). 
If the integer is negative then this size is minus the number of digits .

(References: https://github.com/python/cpython/blob/master/Include/object.h and https://github.com/python/cpython/blob/master/Include/longobject.h)

As you see, the inner CPython's representation of an integer is really far from the usual binary representation. Yet CPython has to provide bitwise operations for various purposes. Let's take a look at the comments in the code:

static PyObject *
long_bitwise(PyLongObject *a,
             char op,  /* '&', '|', '^' */
             PyLongObject *b)
{
    /* Bitwise operations for negative numbers operate as though
       on a two's complement representation.  So convert arguments
       from sign-magnitude to two's complement, and convert the
       result back to sign-magnitude at the end. */

    /* If a is negative, replace it by its two's complement. */
    /* Same for b. */
    /* Complement result if negative. */
}


To handle negative integers in bitwise operations, CPython use the two's complement (actually, that's a two's complement digit by digit, but I don't go into the details). But note the "Sign Rule" (name is mine): the sign of the result is the bitwise operator applied to the signs of the numbers. More precisely, the result is negative if nega <op> negb == 1, (negx = 1 for negative, 0 for positive). Simplified code:

switch (op) {
    case '^': negz = nega ^ negb; break;
    case '&': negz = nega & negb; break;
    case '|': negz = nega | negb; break;
    default: ...
}


Binary formatting

On the other hand, the formatter does not perform the two's complement, even in binary representation: [format_long_internal](https://github.com/python/cpython/blob/master/Python/formatter_unicode.c#L839) calls [long_format_binary](https://github.com/python/cpython/blob/master/Objects/longobject.c#L1934) and remove the two leading characters, but keeps the sign. See the code:

 /* Is a sign character present in the output?  If so, remember it
           and skip it */
        if (PyUnicode_READ_CHAR(tmp, inumeric_chars) == '-') {
            sign_char = '-';
            ++prefix;
            ++leading_chars_to_skip;
}


The long_format_binary function does not perform any two's complement: just output the number in base 2, preceeded by the sign.

    if (negative)                                                   \
        *--p = '-'; \


Your question

I will follow your REPL sequence:

>>> x = -4
>>> print("{} {:b}".format(x, x))
-4 -100


Nothing surprising, given that there is no two's complement in formatting, but a sign.

>>> mask = 0xFFFFFFFF
>>> print("{} {:b}".format(x & mask, x & mask))
4294967292 11111111111111111111111111111100


The number -4 is negative. Hence, it is replaced by its two's complement before the logical and, digit by digit. You expected that the result will be turned into a negative number, but remenber the "Sign Rule":

>>> nega=1; negb=0
>>> nega & negb
0


Hence: 1. the result does not have the negative sign; 2. the result is not complemented to two. Your result is compliant with the "Sign Rule", even if this rule doesn't seem very intuitive.

Now, the last part:

>>> x = 0b11111111111111111111111111111100
>>> print("{} {:b}".format(x, x))
4294967292 11111111111111111111111111111100
>>> print("{} {:b}".format(~(x ^ mask), ~(x ^ mask)))
-4 -100


Again, -4 is negative, hence replaced by it's two's complement 0b11111111111111111111111111111100, then XORed with 0b11111111111111111111111111111111. The result is 0b11 (3). You take the complement unary, that is 0b11111111111111111111111111111100 again, but this time the sign is negative:

>>> nega=1; negb=0
>>> nega ^ negb
1


Therefore, the result is complemented and gets the negative sign, as you expected.

Conclusion: I guess there was no perfect solution to have arbitrary long signed number and provide bitwise operations, but the documentation is not really verbose on the choices that were made.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复