Reading binary file defined by a struct

后端 未结 5 1843
轻奢々
轻奢々 2021-02-11 01:23

Could somebody point me in the right direction of how I could read a binary file that is defined by a C struct? It has a few #define inside of the struct, which makes me thing t

5条回答
  •  轮回少年
    2021-02-11 01:52

    There are some bad ideas and good ideas:

    That's a bad idea to:

    • Typecast a raw buffer into struct
      • There are endianness issues (little-endian vs big-endian) when parsing integers >1 byte long or floats
      • There are byte alignment issues in structures, which are very compiler-dependent. One can try to disable alignment (or enforce some manual alignment), but it's generally a bad idea too. At the very least, you'll ruin performance by making CPU access unaligned integers. Internal RISC core would have to do 3-4 ops instead of 1 (i.e. "do part 1 in first word", "do part 2 in second word", "merge the result") to access it every time. Or worse, compiler pragmas to control alignment will be ignored and your code will break.
      • There are no exact size guarantees for regular int, long, short, etc, type in C/C++. You can use stuff like int16_t, but these are available only on modern compilers.
      • Of course, this approach breaks completely when using structures that reference other structures: one has to unroll them all manually.
    • Write parsers manually: it's much harder than it seems on the first glance.
      • A good parser needs to do lots of sanity checking on every stage. It's easy to miss something. It is even easier to miss something if you don't use exceptions.
      • Using exceptions makes you prone to fail if your parsing code is not exception-safe (i.e. written in a way that it can be interrupted at some points and it won't leak memory / forget to finalize some objects)
      • There could be performance issues (i.e. doing lots of unbuffered IO instead of doing one OS read syscall and parsing a buffer then — or vice versa, reading whole thing at once instead of more granular, lazy reads where it's applicable).

    It's a good idea to

    • Go cross-platform. Pretty much self-explanatory, with all the mobile devices, routers and IoT stuff booming around in the recent years.
    • Go declarative. Consider using any of declarative specs to describe your structure and then use a parser generator to generate a parser.

    There are several tools available to do that:

    • Kaitai Struct — my favorite so far, cross-platform, cross-language — i.e. you describe your structure once and then you can compile it into a parser in C++, C#, Java, Python, Ruby, PHP, etc.
    • binpac — pretty dated, but still usable, C++-only — similar to Kaitai in ideology, but unsupported since 2013
    • Spicy — said to be "modern rewrite" of binpac, AKA "binpac++", but still in early stages of development; can be used for smaller tasks, C++ only too.

提交回复
热议问题