In Metal what is the difference between a packed_float4 and a float4?
This information is from here
float4 has an alignment of 16 bytes. This means that the memory address of such a type (e.g. 0x12345670) will be divisible by 16 (aka the last hexadecimal digit is 0).
packed_float4 on the other hand has an alignment of 4 bytes. Last digit of the address will be 0, 4, 8 or c
This does matter when you create custom structs. Say you want a struct with 2 normal floats and 1 float4/packed_float4:
struct A{
float x, y;
float4 z;
}
struct B{
float x, y;
packed_float4 z;
}
For A: The alignment of float4 has to be 16 and since float4 has to be after the normal floats, there is going to be 8 bytes of empty space between y and z. Here is what A looks like in memory:
Address | 0x200 | 0x204 | 0x208 | 0x20c | 0x210 | 0x214 | 0x218 | 0x21c |
Content | x | y | - | - | z1 | z2 | z3 | z4 |
^Has to be 16 byte aligned
For B: Alignment of packed_float4 is 4, the same as float, so it can follow right after the floats in any case:
Address | 0x200 | 0x204 | 0x208 | 0x20c | 0x210 | 0x214 |
Content | x | y | z1 | z2 | z3 | z4 |
As you can see, A takes up 32 bytes whereas B only uses 24 bytes. When you have an array of those structs, A will take up 8 more bytes for every element. So for passing around a lot of data, the latter is preferred.
The reason you need float4 at all is because the GPU can't handle 4 byte aligned packed_float4s, you won't be able to return packed_float4 in a shader. This is because of performance I assume.
One last thing: When you declare the Swift version of a struct:
struct S {
let x, y: Float
let z : (Float, Float, Float, Float)
}
This struct will be equal to B in Metal and not A. A tuple is like a packed_floatN.
All of this also applies to other vector types such as packed_float3, packed_short2, ect.