Read EXE, MSI, and ZIP file metadata in Python in Linux

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-03 07:18:00

Take a look at this library: http://bitbucket.org/haypo/hachoir/wiki/Home and this example program that uses the library: http://pypi.python.org/pypi/hachoir-metadata/1.3.3. The second link is an example program which uses the Hachoir binary file manipulation library (first link) to parse the metadata.

The library can handle these formats:

  • Archives: bzip2, gzip, zip, tar
  • Audio: MPEG audio ("MP3"), WAV, Sun/NeXT audio, Ogg/Vorbis (OGG), MIDI, AIFF, AIFC, Real audio (RA)
  • Image: BMP, CUR, EMF, ICO, GIF, JPEG, PCX, PNG, TGA, TIFF, WMF, XCF
  • Misc: Torrent
  • Program: EXE
  • Video: ASF format (WMV video), AVI, Matroska (MKV), Quicktime (MOV), Ogg/Theora, Real media (RM)

Additionally, Hachoir can do some file manipulation operations which I would assume includes some primitive metadata manipulation.

Glaudiston

The hachoir-metadata get the "Product Version" but the compilers changes the "File Version". Then the version returned is not the we need.

I found a small a well working soluction:

http://pev.sourceforge.net/

I've tested with success. It's simple, fast and stable.

To answer one of your questions, you can use the zipfile module, specifically the ZipInfo object to get the metadata for zip files.

As for hashing only the data of the file, you can only to that if you know which parts are data and which are metadata. There can be no general method as many file formats store their metadata differently.

To answer your second question: no, there is no way to hash a PE file or ZIP file, ignoring the metadata, without locating and reading the metadata. This is because the metadata you're interested in is stored at variable locations in the file.

In the case of PE files (EXE, DLL, etc), it's stored in a resource block, typically towards the end of the file, and a series of pointers and tables at the start of the file gives the location.

In the case of ZIP files, it's scattered throughout the archive -- each included file is preceded by its own metadata, and then there's a table at the end giving the locations of each metadata block. But it sounds like you might actually be wanting to read the files within the ZIP archive and look for EXEs in there if you're after program metadata; the ZIP archive itself does not store company names or version numbers.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!