What to put in a binary data file's header

前端 未结 12 2274
隐瞒了意图╮
隐瞒了意图╮ 2021-02-06 01:52

I have a simulation that reads large binary data files that we create (10s to 100s of GB). We use binary for speed reasons. These files are system dependent, converted from te

12条回答
  •  悲哀的现实
    2021-02-06 02:16

    In my experience, second-guessing the data you'll need is invariably wasted time. What's important is to structure your metadata in a way that is extensible. For XML files, that's straightforward, but binary files require a bit more thought.

    I tend to store metadata in a structure at the END of the file, not the beginning. This has two advantages:

    • Truncated/unterminated files are easily detected.
    • Metadata footers can often be appended to existing files without impacting their reading code.

    The simplest metadata footer I use looks something like this:

    struct MetadataFooter{
      char[40] creatorVersion;
      char[40] creatorApplication;
      .. or whatever
    } 
    
    struct FileFooter
    {
      int64 metadataFooterSize;  // = sizeof(MetadataFooter)
      char[10] magicString;   // a unique identifier for the format: maybe "MYFILEFMT"
    };
    

    After the raw data, the metadata footer and THEN the file footer are written.

    When reading the file, seek to the end - sizeof(FileFooter). Read the footer, and verify the magicString. Then, seek back according to metadataFooterSize and read the metadata. Depending on the footer size contained in the file, you can use default values for missing fields.

    As KeithB points out, you could even use this technique to store the metadata as an XML string, giving the advantages of both totally extensible metadata, with the compactness and speed of binary data.

提交回复
热议问题