Parsing binary data into ctypes Structure object via readinto()

前端未结

关注

 1  1887

南旧 2020-12-30 10:47

I\'m trying to handle a binary format, following the example here:

http://dabeaz.blogspot.jp/2009/08/python-binary-io-handling.html

>>> from


      
      
        
          1条回答        

        
                    
            
            
                         
                
              
              
                
                   半阙折子戏
                                             
                
                
                (楼主)
            
              
              
                2020-12-30 11:22
              

            
            
                        
This line definition is actually for defining a bitfield:

...
("more_funky_numbers_7bytes", c_uint, 56),
...


which is wrong here. The size of a bitfield should be less than or equals the size of the type, so c_uint should be at most 32, one extra bit will raise the exception:

ValueError: number of bits invalid for bit field


Example of using the bitfield:

from ctypes import *

class MyStructure(Structure):
    _fields_ = [
        # c_uint8 is 8 bits length
        ('a', c_uint8, 4), # first 4 bits of `a`
        ('b', c_uint8, 2), # next 2 bits of `a`
        ('c', c_uint8, 2), # next 2 bits of `a`
        ('d', c_uint8, 2), # since we are beyond the size of `a`
                           # new byte will be create and `d` will
                           # have the first two bits
    ]

mystruct = MyStructure()

mystruct.a = 0b0000
mystruct.b = 0b11
mystruct.c = 0b00
mystruct.d = 0b11

v = c_uint16()

# copy `mystruct` into `v`, I use Windows
cdll.msvcrt.memcpy(byref(v), byref(mystruct), sizeof(v))

print sizeof(mystruct) # 2 bytes, so 6 bits are left floating, you may
                       # want to memset with zeros
print bin(v.value)     # 0b1100110000


what you need is 7 bytes so what you endup doing is correct:

...
("more_funky_numbers_7bytes", c_byte * 7),
...


As for the size for the structure, It's going to be 52, I extra byte will be padded to align the structure on 4 bytes on 32 bit processor or 8 bytes on 64 bits. Here:

from ctypes import *

class BinaryHeader(BigEndianStructure):
    _fields_ = [
        ("sequence_number_4bytes", c_uint),
        ("ascii_text_32bytes", c_char * 32),
        ("timestamp_4bytes", c_uint),
        ("more_funky_numbers_7bytes", c_byte * 7),
        ("some_flags_1byte", c_byte),
        ("other_flags_1byte", c_byte),
        ("payload_length_2bytes", c_ushort),
    ]

mystruct = BinaryHeader(
    0x11111111,
    '\x22' * 32,
    0x33333333,
    (c_byte * 7)(*([0x44] * 7)),
    0x55,
    0x66,
    0x7777
)

print sizeof(mystruct)

with open('data.txt', 'wb') as f:
    f.write(mystruct)


The extra byte is padded between other_flags_1byte and payload_length_2bytes in the file:

00000000 11 11 11 11 ....
00000004 22 22 22 22 """"
00000008 22 22 22 22 """"
0000000C 22 22 22 22 """"
00000010 22 22 22 22 """"
00000014 22 22 22 22 """"
00000018 22 22 22 22 """"
0000001C 22 22 22 22 """"
00000020 22 22 22 22 """"
00000024 33 33 33 33 3333
00000028 44 44 44 44 DDDD
0000002C 44 44 44 55 DDDU
00000030 66 00 77 77 f.ww
            ^
         extra byte


This is an issue when it comes to the file formats and network protocols. To change it pack it by 1:

 ...
class BinaryHeader(BigEndianStructure):
    _pack_ = 1
    _fields_ = [
        ("sequence_number_4bytes", c_uint),
...


the file will be:

00000000 11 11 11 11 ....
00000004 22 22 22 22 """"
00000008 22 22 22 22 """"
0000000C 22 22 22 22 """"
00000010 22 22 22 22 """"
00000014 22 22 22 22 """"
00000018 22 22 22 22 """"
0000001C 22 22 22 22 """"
00000020 22 22 22 22 """"
00000024 33 33 33 33 3333
00000028 44 44 44 44 DDDD
0000002C 44 44 44 55 DDDU
00000030 66 77 77    fww 


As for struct, it won't make it easier in your case. Sadly it doesn't support nested tuples in the format. For example here:

>>> from struct import *
>>>
>>> data = '\x11\x11\x11\x11\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22
\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x33
\x33\x33\x33\x44\x44\x44\x44\x44\x44\x44\x55\x66\x77\x77'
>>>
>>> BinaryHeader = Struct('>I32cI7BBBH')
>>>
>>> BinaryHeader.unpack(data)
(286331153, '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"'
, '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"'
, '"', '"', 858993459, 68, 68, 68, 68, 68, 68, 68, 85, 102, 30583)
>>>


This result cannot be used namedtuple, you still have parse it based on the index. It would work if you can do something like '>I(32c)(I)(7B)(B)(B)H'. This feature has been requested here (Extend struct.unpack to produce nested tuples) since 2003 but nothing is done since.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                    
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复