Optimisation: Shrinking file size in C or C++

落爺英雄遲暮 提交于 2020-03-03 04:50:07

问题


When performing computer simulations of systems with n (e.g. 10000) particles, the usual workflow involves saving the state of the system frequently at given intervals. This would entail writing down in a file the position coordinates of all the particles (so 3 floats/doubles per line, each line for a particle), with some header information. The floating precision is set to a fixed value.

The way I usually save/write down my configuration files is as follows (part of a function that creates the file whenever called):

#include <iostream>
#include <fstream>

ofstream outfile(filelabel, ios::out);
outfile.precision(10);

outfile << "#Number of particles " << npart << endl;

for (int i=0; i<npart; i++){
outfile << particle[i].pos[0] << " " << particle[i].pos[1] << " " << particle[i].pos[2] << endl;
}

outfile.close();

Typically, each such file for a large enough system will have a size of 0.5-4 MB, so when saving them frequently they do add up to a large size at end. So I'm trying to learn how I could optimize the size of my configuration files to bare minimum, e.g. by (2 thoughts that come to mind):

  • Using a different method of writing, and not necessarily writing '.txt' files.
  • Possibly compressing (e.g. zipping) the data before writing to file.

Any suggestions and recommendations for how I could shrink the size of the configuration files within C/C++ possibilities would be highly appreciated.


Small addendum

As per the suggestions so far, the binary format for saving seems to be a very good alternative approach, however, as a follow-up question, would one be able to read the so binary saved data in Python for example?

This is relevant as given the saved config files, I tend to use Python for my post-analysis purposes.


回答1:


Four suggestions:

  1. Saving vector information (direction and offset) should take less space than saving X-Y-Z coordinates. But that means keeping a reference to the initial state file - which is more computationally intensive.

  2. Assuming the above method is not practical, then I would still consider using vectors if storage space is more critical than computational time. A 3D vector encodes the location in 2 values instead of three, so even if you reference all locations from the origin instead of the particle's previous location, the files should be nearly 30% smaller (assuming a requirement for greater precision in storing the vectors).

  3. How "random" are the location coordinates? If there's some correlation, then I would keep the data in text and use a lossless file compression method (such as the suggestion to save the files on a disk that supports filesystem compression - which means no work for you!) Any repeating strings of characters will get compressed and could be more efficient than a binary file - if the data has repeating strings. If the coordinates appear pseudo-random, then compression (like ZIP format) won't buy you anything and you should use the binary value method.

  4. If storing in binary (perhaps even in text) consider converting the floating point values into integers that fit your volume/precision before writing them to the file. This will take far less space than storing floating point (or worse double) values. That of course assumes that the precision you need can be represented within the precision of an int (or a long).



来源:https://stackoverflow.com/questions/58395488/optimisation-shrinking-file-size-in-c-or-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!