I have a simulation that reads large binary data files that we create (10s to 100s of GB). We use binary for speed reasons. These files are system dependent, converted from te
As my experience with telecom equipment configuration and firmware upgrades shows you only really need several predefined bytes at the begin (this is important) which starts from version (fixed part of header). Rest of header is optional, by indicating proper version you can always show how to process it. Important thing here is you'd better place 'variable' part of header at the end of file. If you plan operations on header without modifying file content itself. Also this simplify 'append' operations which should recalculate variable header part.
Nice to have features for fixed size header (at the begin):
OK, for variable part XML or some pretty extensible format in header is good idea but is it really needed? I had lot of experience with ASN encoding... in most cases its usage was overshot.
Well, maybe you will have additional understanding when you look at things like TPKT format which is described in RFC 2126 (chapter 4.3).