Searching/reading binary data in Python

前端 未结 8 1841
情歌与酒
情歌与酒 2020-12-25 12:15

I\'m reading in a binary file (a jpg in this case), and need to find some values in that file. For those interested, the binary file is a jpg and I\'m attempting to pick out

相关标签:
8条回答
  • 2020-12-25 12:53

    The bitstring module was designed for pretty much this purpose. For your case the following code (which I haven't tested) should help illustrate:

    from bitstring import ConstBitStream
    # Can initialise from files, bytes, etc.
    s = ConstBitStream(filename='your_file')
    # Search to Start of Frame 0 code on byte boundary
    found = s.find('0xffc0', bytealigned=True)
    if found:
        print("Found start code at byte offset %d." % found[0])
        s0f0, length, bitdepth, height, width = s.readlist('hex:16, uint:16, 
                                                            uint:8, 2*uint:16')
        print("Width %d, Height %d" % (width, height))
    
    0 讨论(0)
  • 2020-12-25 12:53

    Instead of reading the entire file into memory, searching it and then writing a new file out to disk you can use the mmap module for this. mmap will not store the entire file in memory and it allows for in-place modification.

    #!/usr/bin/python
    
    import mmap
    
    with open("hugefile", "rw+b") as f:
        mm = mmap.mmap(f.fileno(), 0)
        print mm.find('\x00\x09\x03\x03')
    
    0 讨论(0)
  • 2020-12-25 12:59

    For Python >=3.2:

    import re
    
    f = open("filename.jpg", "rb")
    byte = f.read()
    f.close()
    
    matchObj = re.match( b'\xff\xd8.*\xff\xc0...(..)(..).*\xff\xd9', byte, re.MULTILINE|re.DOTALL)
    if matchObj:
        # https://stackoverflow.com/q/444591
        print (int.from_bytes(matchObj.group(1), 'big')) # height
        print (int.from_bytes(matchObj.group(2), 'big')) # width
    
    0 讨论(0)
  • 2020-12-25 13:03

    In Python 3.x you can search a byte string by another byte string like this:

    >>> byte_array = b'this is a byte array\r\n\r\nXYZ\x80\x04\x95 \x00\x00\x00\x00\x00'
    >>> byte_array.find('\r\n\r\n'.encode())
    20
    >>>
    
    0 讨论(0)
  • 2020-12-25 13:10

    The re module does work with both string and binary data (str in Python 2 and bytes in Python 3), so you can use it as well as str.find for your task.

    0 讨论(0)
  • 2020-12-25 13:14

    The find() method should be used only if you need to know the position of sub, if not, you can use the in operator, for example:

    with open("foo.bin", 'rb') as f:
        if b'\x00' in f.read():
            print('The file is binary!')
        else:
            print('The file is not binary!')
    
    0 讨论(0)
提交回复
热议问题