I think Stephen Denne might be onto something here. Imagine:
- zip-like compression of sequences to codes
- a dictionary mapping code -> sequence
- file will be like a filesystem
- each write generates a new "file" (a sequence of bytes, compressed according to dictionary)
- "filesystem" keeps track of which "file" belongs to which bytes (start, end)
- each "file" is compressed according to dictionary
- reads work filewise, uncompressing and retrieving bytes according to "filesystem"
- writes make "files" invalid, new "files" are appended to replace the invalidated ones
- this system will need:
- defragmentation mechanism of filesystem
- compacting dictionary from time to time (removing unused codes)
- done properly, housekeeping could be done when nobody is looking (idle time) or by creating a new file and "switching" eventually
One positive effect would be that the dictionary would apply to the whole file. If you can spare the CPU cycles, you could periodically check for sequences overlapping "file" boundaries and then regrouping them.
This idea is for truly random reads. If you are only ever going to read fixed sized records, some parts of this idea could get easier.