I\'ve been using Matlab on and off for decades. I thought I had a good grip on arrays, structs, cell arrays, tables, an array of structs, and a struct in which each field i
I think your question eventually narrows down to these three "types" of data structures:
comparative limitations between cell arrays, arrays of structs, and structs whose fiels are themselves arrays
[Note that "structs whose fields are themselves arrays" I translate as "scalar structs" here. An array of structs can also contain arbitrary arrays. My thinking becomes clear below, I hope.]
To me, these are not very different. All three are containers for heterogeneous data. (Heterogeneous data is non-uniform data, each data element is potentially of a different type and size.) Each of these statements can return an array of any type, unrelated to the type of any other array in the container:
cell array: array{i,j}
struct array: array(i,j).value
scalar struct: array.value
So it all depends on how you want to index:
array(i,j).value
^ ^
A B
If you want to index using A
only, use a cell array (though you then need curly braces, of course). If you want to index using B
only, use a scalar struct. If you want both A
and B
, use a struct array.
There is no difference in cost that I'm aware of. Each of the arrays contained in these containers takes up some space. The spatial overhead of the various containers is similar, and I have never noted a time overhead difference.
However, there is a huge difference between these two:
array(i).value % s1
array.value(i) % s2
I think that the question deals with this difference also. s1
has a lot more spatial overhead than s2
:
>> s1=struct('value',num2cell(1:100))
s1 =
1×100 struct array with fields:
value
>> s2=struct('value',1:100)
s2 =
struct with fields:
value: [1×100 double]
>> whos
Name Size Bytes Class Attributes
s1 1x100 12064 struct
s2 1x1 976 struct
The data needs 800 bytes, so s2
has 176 bytes of overhead, whereas s1
has 11264 (1408%)!
The reason is not the container, but the fact that we're storing one array with 100 elements in one, and 100 arrays with one element in the other. Each array has a header of a certain size that MATLAB uses to know what type of array it is, what sizes it has, to manage its storage and the delayed copy mechanism. The fewer arrays one has, the less memory one uses.
So, don't use a heterogeneous container to store scalars! These things only make sense when you need to store larger arrays, or arrays of different type or size.
The heterogeneous container that is not explicitly asked about (and after the edit explicitly not asked about) is the table. A table is similar to a scalar struct in that each column of the table is a single array, and different columns can have different types. Note that it is possible to use a cell array as a column, allowing for heterogenous elements to be stored in a column, but they make most sense if this is not the case.
One difference with a scalar struct is that each column must have the same number of rows. Another difference is that indexing can look like that of a cell array, a scalar struct, or a struct array.
Thus, the table forces some constrains upon the contained data, which is very beneficial in some circumstances.
However, and as the OP noted, working with tables is slower than working with structs. This is because table
is a custom class, not a native type like structs and cell arrays. If you type edit table
in MATLAB, you'll see the source code, how it's implemented. It's a classdef
file, just like something any of us could write. Consequently, it has the same speed limitations: the JIT is not optimized for it, indexing into a table implies running a function written as an M-file, etc.
One more thing: Don't create cell arrays of structs, or scalar structs with cell arrays. This increases the levels of containers, which increases overhead (both in space and time), and makes the contents more difficult to use. I have seen questions here on SO related to difficulty accessing data, caused by this type of construct:
data{i,j}.value % A cell array with structs. Don't do this!
data.value{i,j} % A struct with cell arrays. Don't do this!
The first example is equal to a struct array (with a lot more overhead), except there is no control over the struct fields within each cell. That is, it is possible for one of the cells to not have a .value
field.
The second example makes sense only if value
is a different size than a second struct field. If all struct fields are (supposed to be) cell arrays of the same size like this, then use a struct array. Again, less overhead and more uniformity.