Struct of arrays, arrays of structs and memory usage pattern

房东的猫 提交于 2020-01-05 08:47:10

问题


I've been reading about SOA and I wanted to try an implement it in a system that I am building up.

I am writing some simple C struct to do some tests but I am a bit confused, right now I have 3 different struct for a vec3. I will show them below and then go into further details about the question.

struct vec3
{
size_t x, y, z;
};

struct vec3_a
{
size_t pos[3];
};

struct vec3_b
{
size_t* x;
size_t* y;
size_t* z;
};

struct vec3 vec3(size_t x, size_t y, size_t z)
{
    struct vec3 v;
    v.x = x;
    v.y = y;
    v.z = z;
    return v;
}

struct vec3_a vec3_a(size_t x, size_t y, size_t z)
{
    struct vec3_a v;
    v.pos[0] = x;
    v.pos[1] = y;
    v.pos[2] = z;
    return v;
}

struct vec3_b vec3_b(size_t x, size_t y, size_t z)
{
    struct vec3_b v;
    v.x = (size_t*)malloc(sizeof(size_t));
    v.y = (size_t*)malloc(sizeof(size_t));
    v.z = (size_t*)malloc(sizeof(size_t));
    *(v.x) = x;
    *(v.y) = y;
    *(v.z) = z;
    return v;
}

That's the declarations of the three types of vec3.

struct vec3 v = vec3(10, 20, 30);
struct vec3_a va = vec3_a(10, 20, 30);
struct vec3_b vb = vec3_b(10, 20, 30);

Printing out the addresses with printf I get values like these:

size of vec3      : 24 bytes
size of vec3a     : 24 bytes
size of vec3b     : 24 bytes
size of size_t    : 8 bytes
size of int       : 4 bytes
size of 16 int    : 64 bytes
vec3 x:10, y:20, z:30
vec3 x:0x7fff57f8e788, y:0x7fff57f8e790, z:0x7fff57f8e798
vec3a x:10, y:20, z:30
vec3a x:0x7fff57f8e768, y:0x7fff57f8e770, z:0x7fff57f8e778
vec3b x:10, y:20, z:30
vec3b x:0x7fbe514026a0, y:0x7fbe51402678, z:0x7fbe51402690

One final thing that I did was create an array of 10 struct vec3_b and printed out the addresses which returned these values.

    struct vec3_b vb3[10];
    for(int i = 0; i < 10; i++)
    {
        vb3[i] = vec3_b(i, i*2, i*4);
    }

index:0 vec3b x:0x7fbe514031f0, y:0x7fbe51403208, z:0x7fbe51403420
index:1 vec3b x:0x7fbe51403420, y:0x7fbe51403438, z:0x7fbe51403590
index:2 vec3b x:0x7fbe51403590, y:0x7fbe514035a8, z:0x7fbe514035c0
index:3 vec3b x:0x7fbe514035c0, y:0x7fbe514035d8, z:0x7fbe514035f0
index:4 vec3b x:0x7fbe514035f0, y:0x7fbe51403608, z:0x7fbe51403680
index:5 vec3b x:0x7fbe51403680, y:0x7fbe51403698, z:0x7fbe514036b0
index:6 vec3b x:0x7fbe514036b0, y:0x7fbe514036c8, z:0x7fbe514036e0
index:7 vec3b x:0x7fbe514036e0, y:0x7fbe514036f8, z:0x7fbe51403710
index:8 vec3b x:0x7fbe51403710, y:0x7fbe51403728, z:0x7fbe51403740
index:9 vec3b x:0x7fbe51403740, y:0x7fbe51403758, z:0x7fbe51403770

Questions:

  1. Is my implementation of struct vec3_b the proper way to setup a struct of array?

  2. Since the vec_3b structure is 24 bytes large, I could fit 2 plus 12 additional bytes in 1 modern cpu's cache line?

  3. If my vec3_b is the proper way to do a SoA setup, I am having some trouble with the addressing, where I put 10 vec3_b together.

Looking at the hex values and their decimal representations I cannot see any pattern which leads me to believe that my setup is incorrect.

      ---------------x-----------------|----------------y-----------------|----------------z-----------------|

0|    0x7fbe514031f0 : 140455383675376 | 0x7fbe51403208 : 140455383675400 | 0x7fbe51403420 : 140455383675936
1|    0x7fbe51403420 : 140455383675936 | 0x7fbe51403438 : 140455383675960 | 0x7fbe51403590 : 140455383676304
2|    0x7fbe51403590 : 140455383676304 | 0x7fbe514035a8 : 140455383676328 | 0x7fbe514035c0 : 140455383676352

回答1:


  1. I can't think of an occasion when vec_3b would be a good idea.

  2. Note that you also have to find space for 24 bytes of data for the pointers to point at, and it probably won't be contiguous with the structure itself, so you have probably just reduced your effective cache size by a factor of 2 compared to vec3 or vec_3a. Each malloc() has a minimum size; on a 64-bit machine, that is usually at least 16 bytes. So three separate allocations for the three pointed at values in a vec_3b structure needs at least 48 other bytes for the supporting data (plus the 24 for the structure itself). That doesn't fit in a single cache line; it's not guaranteed to be placed so that it fits into 2 cache lines.

  3. N/A — the question is predicated on a false assumption.




回答2:


1 & 3: No, your vec3_b is not a struct-of-arrays setup.

What you're doing is having multiple structs, each with a 64bit pointer to 64bits of data.

With struct-of-arrays, you make ONE struct, and it has a few arrays of variable size.

So the 10th x value would be mystruct.x[9], not mystruct[9].x[0].

The key point is to have all the x values stored contiguously, so you can load multiple x values with a movdqu / _mm_loadu_si128. If you're working with SIMD, choose the smallest element width that will support the range of values you need. Using 64bit elements will cut your throughput in half, vs. 32bit elements. Your code will process 128b at a time, and that's twice as many elements if they're half-width.



来源:https://stackoverflow.com/questions/31421171/struct-of-arrays-arrays-of-structs-and-memory-usage-pattern

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!