Issues saving double as binary in c++

问题

In my simulation code for a particle system, I have a class defined for particles, and each particle has a property of pos containing its position, which is a double pos[3]; as there are 3 coordinate components per particle. So with particle object defined by particles = new Particle[npart]; (as we have npart many particles), then e.g. the y-component of the 2nd particle would be accessed with double dummycomp = particles[1].pos[1];

To save the particles to file before using binary I would use (saved as txt, with float precision of 10 and one particle per line):

#include <iostream>
#include <fstream>

ofstream outfile("testConfig.txt", ios::out);
outfile.precision(10);

  for (int i=0; i<npart; i++){
    outfile << particle[i].pos[0] << " " << particle[i].pos[1]  << " " << particle[i].pos[2] << endl;
}
outfile.close();

But now, to save space, I am trying to save the configuration as a binary file, and my attempt, inspired from here, has been as follows:

ofstream outfile("test.bin", ios::binary | ios::out);

for (int i=0; i<npart; i++){ 
outfile.write(reinterpret_cast<const char*>(particle[i].pos),streamsize(3*sizeof(double))); 
}
outfile.close();

but I am facing a segmentation fault when trying to run it. My questions are:

Am I doing something wrong with reinterpret_cast or rather in the argument of streamsize()?
Ideally, it would be great if the saved binary format could also be read within Python, is my approach (once fixed) allowing for that?

working example for the old saving approach (non-binary):

#include <iostream>
#include <fstream>

using namespace std;
class Particle {

 public:

  double pos[3];

};


int main() {

  int npart = 2;
  Particle particles[npart];
  //initilizing the positions:
  particles[0].pos[0] = -74.04119568;
  particles[0].pos[1] = -44.33692582;
  particles[0].pos[2] = 17.36278231;

  particles[1].pos[0] = 48.16310086;
  particles[1].pos[1] = -65.02325252;
  particles[1].pos[2] = -37.2053818;

  ofstream outfile("testConfig.txt", ios::out);
  outfile.precision(10);

    for (int i=0; i<npart; i++){
      outfile << particles[i].pos[0] << " " << particles[i].pos[1]  << " " << particles[i].pos[2] << endl;
  }
  outfile.close();

    return 0;
}

And in order to save the particle positions as binary, substitute the saving portion of the above sample with

  ofstream outfile("test.bin", ios::binary | ios::out);

  for (int i=0; i<npart; i++){
  outfile.write(reinterpret_cast<const char*>(particles[i].pos),streamsize(3*sizeof(double))); 
  }
  outfile.close();

2nd addendum: reading the binary in Python

I managed to read the saved binary in python as follows using numpy:

data = np.fromfile('test.bin', dtype=np.float64)
data
array([-74.04119568, -44.33692582,  17.36278231,  48.16310086,
       -65.02325252, -37.2053818 ])

But given the doubts cast in the comments regarding non-portability of binary format, I am not confident this type of reading in Python will always work! It would be really neat if someone could elucidate on the reliability of such approach.

回答1:

The trouble is that base 10 representation of double in ascii is flawed and not guaranteed to give you the correct result (especially if you only use 10 digits). There is a potential for a loss of information even if you use all std::numeric_limits<max_digits10> digits as the number may not be representable in base 10 exactly.

The other issue you have is that the binary representation of a double is not standardized so using it is very fragile and can lead to code breaking very easily. Simply changing the compiler or compiler sittings can result in a different double format and changing architectures you have absolutely no guarantees.

You can serialize it to text in a non lossy representation by using the hex format for doubles.

 stream << std::fixed << std::scientific << particles[i].pos[0];

 // If you are using C++11 this was simplified to

 stream << std::hexfloat << particles[i].pos[0];

This has the affect of printing the value with the same as "%a" in printf() in C, that prints the string as "Hexadecimal floating point, lowercase". Here both the radix and mantissa are converted into hex values before being printed in a very specific format. Since the underlying representation is binary these values can be represented exactly in hex and provide a non lossy way of transferring data between systems. IT also truncates proceeding and succeeding zeros so for a lot of numbers is relatively compact.

On the python side. This format is also supported. You should be able to read the value as a string then convert it to a float using float.fromhex()

see: https://docs.python.org/3/library/stdtypes.html#float.fromhex

But your goal is to save space:

But now, to save space, I am trying to save the configuration as a binary file.

I would ask the question do you really need to save space? Are you running on a low powered low resource environment? Sure then space saving can definitely be a thing (but that is rare nowadays (but these environments do exist)).

But it seems like you are running some form of particle simulation. This does not scream low resource use case. Even if you have tera bytes of data I would still go with a portable easy to read format over binary. Preferably one that is not lossy. Storage space is cheap.

回答2:

I suggest using a library instead of writing a serialization/deserialization routine from scratch. I find cereal really easy to use, maybe even easier than boost::serialization. It reduces the opportunity for bugs in your own code.

In your case I'd go about serializing doubles like this using cereal:

#include <cereal/archives/binary.hpp>
#include <fstream>

int main() {
    std::ofstream outfile("test.bin", ios::binary);
    cereal::BinaryOutputArchive out(outfile);
    double x, y, z;
    x = y = z = 42.0;
    out(x, y, z);
}

To deserialize them you'd use:

#include <cereal/archives/binary.hpp>
#include <fstream>

int main() {
    std::ifstream infile("test.bin", ios::binary);
    cereal::BinaryInputArchive in(infile);
    double x,y,z;
    in(x, y, z);
}

You can also serialize/deserialize whole std::vector<double>s in the same fashion. Just add #include <cereal/types/vector.hpp> and use in / out like in the given example on a single std::vector<double> instead of multiple doubles.

Ain't that swell.

Edit

In a comment you asked, whether it'd be possible to read a created binary file like that with Python.

Answer:

Serialized binary files aren't really meant to be very portable (things like endianness could play a role here). You could easily adapt the example code I gave you to write a JSON file (another advantage of using a library) and read that format in Python.

Oh and cereal::JSONOutputArchive has an option for setting precision.

回答3:

Just curious if you ever investigated the idea of converting your data to vectored coordinates instead of Cartesian X,Y,Z? It would seem that this would potentially reduce the size of your data by about 30%: Two coordinates instead of three, but perhaps needing slightly higher precision in order to convert back to your X,Y,Z.

The vectored coordinates could still be further optimized by using the various compression techniques above (text compression or binary conversion).

来源：https://stackoverflow.com/questions/58397530/issues-saving-double-as-binary-in-c

标签

c++

fstream

binaryfiles