I\'m reading in a binary file (a jpg in this case), and need to find some values in that file. For those interested, the binary file is a jpg and I\'m attempting to pick out
The bitstring module was designed for pretty much this purpose. For your case the following code (which I haven't tested) should help illustrate:
from bitstring import ConstBitStream
# Can initialise from files, bytes, etc.
s = ConstBitStream(filename='your_file')
# Search to Start of Frame 0 code on byte boundary
found = s.find('0xffc0', bytealigned=True)
if found:
print("Found start code at byte offset %d." % found[0])
s0f0, length, bitdepth, height, width = s.readlist('hex:16, uint:16,
uint:8, 2*uint:16')
print("Width %d, Height %d" % (width, height))
Instead of reading the entire file into memory, searching it and then writing a new file out to disk you can use the mmap module for this. mmap will not store the entire file in memory and it allows for in-place modification.
#!/usr/bin/python
import mmap
with open("hugefile", "rw+b") as f:
mm = mmap.mmap(f.fileno(), 0)
print mm.find('\x00\x09\x03\x03')
For Python >=3.2:
import re
f = open("filename.jpg", "rb")
byte = f.read()
f.close()
matchObj = re.match( b'\xff\xd8.*\xff\xc0...(..)(..).*\xff\xd9', byte, re.MULTILINE|re.DOTALL)
if matchObj:
# https://stackoverflow.com/q/444591
print (int.from_bytes(matchObj.group(1), 'big')) # height
print (int.from_bytes(matchObj.group(2), 'big')) # width
In Python 3.x you can search a byte string by another byte string like this:
>>> byte_array = b'this is a byte array\r\n\r\nXYZ\x80\x04\x95 \x00\x00\x00\x00\x00'
>>> byte_array.find('\r\n\r\n'.encode())
20
>>>
The re
module does work with both string and binary data (str
in Python 2 and bytes
in Python 3), so you can use it as well as str.find
for your task.
The find() method should be used only if you need to know the position of sub, if not, you can use the in
operator, for example:
with open("foo.bin", 'rb') as f:
if b'\x00' in f.read():
print('The file is binary!')
else:
print('The file is not binary!')