I\'m trying to write a script which will extract strings from an executable binary and save them in a file. Having this file be newline-separated isn\'t an option since the
Here's a generator that yields all the strings of printable characters >= min
(4 by default) in length that it finds in filename
:
import string
def strings(filename, min=4):
with open(filename, errors="ignore") as f: # Python 3.x
# with open(filename, "rb") as f: # Python 2.x
result = ""
for c in f.read():
if c in string.printable:
result += c
continue
if len(result) >= min:
yield result
result = ""
if len(result) >= min: # catch result at EOF
yield result
Which you can iterate over:
for s in strings("something.bin"):
# do something with s
... or store in a list:
sl = list(strings("something.bin"))
I've tested this very briefly, and it seems to give the same output as the Unix strings
command for the arbitrary binary file I chose. However, it's pretty naïve (for a start, it reads the whole file into memory at once, which might be expensive for large files), and is very unlikely to approach the performance of the Unix strings
command.