I have a binary file with some layout I know. For example let format be like this:
Currently I do it so:
load file to ifstream
read this stream to char buffer[2]
cast it to
unsigned short
:unsigned short len{ *((unsigned short*)buffer) };
. Now I have length of a string.
That last risks a SIGBUS
(if your character array happens to start at an odd address and your CPU can only read 16-bit values that are aligned at an even address), performance (some CPUs will read misaligned values but slower; others like modern x86s are fine and fast) and/or endianness issues. I'd suggest reading the two characters then you can say (x[0] << 8) | x[1]
or vice versa, using htons if needing to correct for endianness.
- read a stream to
vector
and create astd::string
from thisvector
. Now I have string id.
No need... just read directly into the string:
std::string s(the_size, ' ');
if (input_fstream.read(&s[0], s.size()) &&
input_stream.gcount() == s.size())
...use s...
- the same way
read
next 4 bytes and cast them tounsigned int
. Now I have a stride.while
not end of fileread
float
s the same way - create achar bufferFloat[4]
and cast*((float*)bufferFloat)
for everyfloat
.
Better to read the data directly over the unsigned int
s and floats
, as that way the compiler will ensure correct alignment.
This works, but for me it looks ugly. Can I read directly to
unsigned short
orfloat
orstring
etc. withoutchar [x]
creating? If no, what is the way to cast correctly (I read that style I'm using - is an old style)?
struct Data
{
uint32_t x;
float y[6];
};
Data data;
if (input_stream.read((char*)&data, sizeof data) &&
input_stream.gcount() == sizeof data)
...use x and y...
Note the code above avoids reading data into potentially unaligned character arrays, wherein it's unsafe to reinterpret_cast
data in a potentially unaligned char
array (including inside a std::string
) due to alignment issues. Again, you may need some post-read conversion with htonl
if there's a chance the file content differs in endianness. If there's an unknown number of float
s, you'll need to calculate and allocate sufficient storage with alignment of at least 4 bytes, then aim a Data*
at it... it's legal to index past the declared array size of y
as long as the memory content at the accessed addresses was part of the allocation and holds a valid float
representation read in from the stream. Simpler - but with an additional read so possibly slower - read the uint32_t
first then new float[n]
and do a further read
into there....
Practically, this type of approach can work and a lot of low level and C code does exactly this. "Cleaner" high-level libraries that might help you read the file must ultimately be doing something similar internally....