Python equivalent of unix “strings” utility

a 夏天 提交于 2019-11-27 04:38:15

问题


I'm trying to write a script which will extract strings from an executable binary and save them in a file. Having this file be newline-separated isn't an option since the strings could have newlines themselves. This also means, however, that using the unix "strings" utility isn't an option, since it just prints out all the strings newline-separated, meaning there's no way to tell which strings have newlines included just by looking at the output of "strings". Thus, I was hoping to find a python function or library which implements the same functionality of "strings", but which will give me those strings as variables so that I can avoid the newline issue.

Thanks!


回答1:


Here's a generator that yields all the strings of printable characters >= min (4 by default) in length that it finds in filename:

import string

def strings(filename, min=4):
    with open(filename, errors="ignore") as f:  # Python 3.x
    # with open(filename, "rb") as f:           # Python 2.x
        result = ""
        for c in f.read():
            if c in string.printable:
                result += c
                continue
            if len(result) >= min:
                yield result
            result = ""
        if len(result) >= min:  # catch result at EOF
            yield result

Which you can iterate over:

for s in strings("something.bin"):
    # do something with s

... or store in a list:

sl = list(strings("something.bin"))

I've tested this very briefly, and it seems to give the same output as the Unix strings command for the arbitrary binary file I chose. However, it's pretty naïve (for a start, it reads the whole file into memory at once, which might be expensive for large files), and is very unlikely to approach the performance of the Unix strings command.




回答2:


To quote man strings:

STRINGS(1)                   GNU Development Tools                  STRINGS(1)

NAME
       strings - print the strings of printable characters in files.

[...]
DESCRIPTION
       For each file given, GNU strings prints the printable character
       sequences that are at least 4 characters long (or the number given with
       the options below) and are followed by an unprintable character.  By
       default, it only prints the strings from the initialized and loaded
       sections of object files; for other types of files, it prints the
       strings from the whole file.

You could achieve a similar result by using a regex matching at least 4 printable characters. Something like that:

>>> import re

>>> content = "hello,\x02World\x88!"
>>> re.findall("[^\x00-\x1F\x7F-\xFF]{4,}", content)
['hello,', 'World']

Please note this solution require the entire file content to be loaded in memory.



来源:https://stackoverflow.com/questions/17195924/python-equivalent-of-unix-strings-utility

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!