There are some bad ideas and good ideas:
That's a bad idea to:
- Typecast a raw buffer into struct
- There are endianness issues (little-endian vs big-endian) when parsing integers >1 byte long or floats
- There are byte alignment issues in structures, which are very compiler-dependent. One can try to disable alignment (or enforce some manual alignment), but it's generally a bad idea too. At the very least, you'll ruin performance by making CPU access unaligned integers. Internal RISC core would have to do 3-4 ops instead of 1 (i.e. "do part 1 in first word", "do part 2 in second word", "merge the result") to access it every time. Or worse, compiler pragmas to control alignment will be ignored and your code will break.
- There are no exact size guarantees for regular
int
, long
, short
, etc, type in C/C++. You can use stuff like int16_t
, but these are available only on modern compilers.
- Of course, this approach breaks completely when using structures that reference other structures: one has to unroll them all manually.
- Write parsers manually: it's much harder than it seems on the first glance.
- A good parser needs to do lots of sanity checking on every stage. It's easy to miss something. It is even easier to miss something if you don't use exceptions.
- Using exceptions makes you prone to fail if your parsing code is not exception-safe (i.e. written in a way that it can be interrupted at some points and it won't leak memory / forget to finalize some objects)
- There could be performance issues (i.e. doing lots of unbuffered IO instead of doing one OS
read
syscall and parsing a buffer then — or vice versa, reading whole thing at once instead of more granular, lazy reads where it's applicable).
It's a good idea to
- Go cross-platform. Pretty much self-explanatory, with all the mobile devices, routers and IoT stuff booming around in the recent years.
- Go declarative. Consider using any of declarative specs to describe your structure and then use a parser generator to generate a parser.
There are several tools available to do that:
- Kaitai Struct — my favorite so far, cross-platform, cross-language — i.e. you describe your structure once and then you can compile it into a parser in C++, C#, Java, Python, Ruby, PHP, etc.
- binpac — pretty dated, but still usable, C++-only — similar to Kaitai in ideology, but unsupported since 2013
- Spicy — said to be "modern rewrite" of binpac, AKA "binpac++", but still in early stages of development; can be used for smaller tasks, C++ only too.