Write raw struct contents (bytes) to a file in C. Confused about actual size written

半城伤御伤魂 提交于 2019-12-21 12:42:34

问题


Basic question, but I expected this struct to occupy 13 bytes of space (1 for the char, 12 for the 3 unsigned ints). Instead, sizeof(ESPR_REL_HEADER) gives me 16 bytes.

typedef struct {
  unsigned char version;
  unsigned int  root_node_num;
  unsigned int  node_size;
  unsigned int  node_count;
} ESPR_REL_HEADER;

What I'm trying to do is initialize this struct with some values and write the data it contains (the raw bytes) to the start of a file, so that when I open this file I later I can reconstruct this struct and gain some meta data about what the rest of the file contains.

I'm initializing the struct and writing it to the file like this:

int esprime_write_btree_header(FILE * fp, unsigned int node_size) {
  ESPR_REL_HEADER header = {
    .version       = 1,
    .root_node_num = 0,
    .node_size     = node_size,
    .node_count    = 1
  };

  return fwrite(&header, sizeof(ESPR_REL_HEADER), 1, fp);
}

Where node_size is currently 4 while I experiment.

The file contains the following data after I write the struct to it:

-bash$  hexdump test.dat
0000000 01 bf f9 8b 00 00 00 00 04 00 00 00 01 00 00 00
0000010

I expect it to actually contain:

-bash$  hexdump test.dat
0000000 01 00 00 00 00 04 00 00 00 01 00 00 00
0000010

Excuse the newbiness. I am trying to learn :) How do I efficiently write just the data components of my struct to a file?


回答1:


Microprocessors are not designed to fetch data from arbitrary addresses. Objects such as 4-byte ints should only be stored at addresses divisible by four. This requirement is called alignment.

C gives the compiler freedom to insert padding bytes between struct members to align them. The amount of padding is just one variable between different platforms, another major variable being endianness. This is why you should not simply "dump" structures to disk if you want the program to run on more than one machine.

The best practice is to write each member explicitly, and to use htonl to fix endianness to big-endian before binary output. When reading back, use memcpy to move raw bytes, do not use

char *buffer_ptr;
...
++ buffer_ptr;
struct.member = * (int *) buffer_ptr; /* potential alignment error */

but instead do

memcpy( buffer_ptr, (char *) & struct.member, sizeof struct.member );
struct.member = ntohl( struct.member ); /* if member is 4 bytes */



回答2:


That is because of structure padding, see http://en.wikipedia.org/wiki/Sizeof#Implementation




回答3:


When you write structures as is with fwrite, you get then written as they are in memory, including the "dead bytes" inside the struct that are inserted due to the padding. Additionally, your multi-byte data is written with the endiannes of your system.

If you do not want that to happen, write a function that serializes the data from your structure. You can write only the non-padded areas, and also write multibyte data in a predictable order (e.g. in the network byte order).




回答4:


The struct is subject to alignment rules, which means some items in it get padded. Looking at it, it looks like the first unsigned char field has been padded to 4 bytes.

One of the gotchas here is that the rules can be different from system to system, so if you write the struct as a whole using fwrite in a program compiled with one compiler on one platform, and then try to read it using fread on another, you could get garbage because the second program will assume the data is aligned to fit its conception of the struct layout.

Generally, you have to either:

  1. Decide that saved data files are only valid for builds of your program that share certain characteristics (depending on the documented behaviour of the compiler you used), or

  2. Not write a whole structure as one, but implement a more formal data format where each element is written individually with its size explicitly controlled.

(A related issue is that byte order could be different; the same choice generally applies there too, except that in option 2 you want to explicitly specify the byte order of the data format.)




回答5:


Try hard not do this! The size discrepancy is caused by the padding and alignment used by compilers/linkers to optimze accesses to vars by speed. The padding and alignment rules with language and OS. Furthermore, writing ints and reading them on different hardware can be problematic due to endianness.

Write your metadata byte-by-byte in a structure that cannot be misunderstood. Null-terminated ASCII strings are OK.




回答6:


I use a awesome open source piece of code written by Troy D. Hanson called TPL: http://tpl.sourceforge.net/. With TPL you don't have any external dependency. It's as simple as including tpl.c and tpl.h into your own program and use TPL API.

Here is the guide: http://tpl.sourceforge.net/userguide.html




回答7:


This is because of something called memory alignment. The first char is extended to take 4 bytes of memory. In fact, bigger types like int can only "start" at the beginning of a block of 4 bytes, so the compiler pads with bytes to reach this point.

I had the same problem with the bitmap header, starting with 2 char. I used a char bm[2] inside the struct and wondered for 2 days where the #$%^ the 3rd and 4th bytes of the header where going...

If you want to prevent this you can use __attribute__((packed)) but beware, memory alignment IS necessary to your program to run conveniently.




回答8:


If you want to write the data in a specific format, use array(s) of unsigned char ...

unsigned char outputdata[13];
outputdata[0] = 1;
outputdata[1] = 0;
/* ... of course, use data from struct ... */
outputdata[12] = 0;
fwrite(outputdata, sizeof outputdata, 1, fp);


来源:https://stackoverflow.com/questions/10153155/write-raw-struct-contents-bytes-to-a-file-in-c-confused-about-actual-size-wri

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!