I have a simulation that reads large binary data files that we create (10s to 100s of GB). We use binary for speed reasons. These files are system dependent, converted from te
In my experience, second-guessing the data you'll need is invariably wasted time. What's important is to structure your metadata in a way that is extensible. For XML files, that's straightforward, but binary files require a bit more thought.
I tend to store metadata in a structure at the END of the file, not the beginning. This has two advantages:
The simplest metadata footer I use looks something like this:
struct MetadataFooter{
char[40] creatorVersion;
char[40] creatorApplication;
.. or whatever
}
struct FileFooter
{
int64 metadataFooterSize; // = sizeof(MetadataFooter)
char[10] magicString; // a unique identifier for the format: maybe "MYFILEFMT"
};
After the raw data, the metadata footer and THEN the file footer are written.
When reading the file, seek to the end - sizeof(FileFooter). Read the footer, and verify the magicString. Then, seek back according to metadataFooterSize and read the metadata. Depending on the footer size contained in the file, you can use default values for missing fields.
As KeithB points out, you could even use this technique to store the metadata as an XML string, giving the advantages of both totally extensible metadata, with the compactness and speed of binary data.