Implementing flexible array members with templates and base class

问题

In C99, you commonly see the following pattern:

struct Foo {
    int var1;
    int var2[];
};

Foo * f = malloc(sizeof(struct Foo) + sizeof(int)*n);
for (int i=0; i<n; ++i) {
    f->var2[i] = p;
}

But not only is this bad C++, it's also illegal.

You can achieve a similar effect in C++ like this:

struct FooBase {
    void dostuff();

    int var1;
    int var2[1];
};

template<size_t N>
struct Foo : public FooBase {
    int var2[N-1];
};

Although this will work (in the methods of FooBase you can access var2[2], var2[3], etc) it relies on Foo being standard layout, which isn't very pretty.

The benefit of this is that a non-templated function can receive any Foo* without conversion by taking a FooBase* and call methods that operate on var2, and the memory is all contiguous (which can be useful).

Is there a better way of achieving this (which is legal C++/C++11/C++14)?

I'm not interested in the two trivial solutions (including an extra pointer in the base class to the start of the array, and allocating the array on the heap).

回答1:

What you want to do is possible, not not easy, in C++, and the interface to your struct is not a struct style interface.

Just like how a std::vector takes a block of memory and reformats it into something very much like an array, then overloads operators to make itself look array-like, you can do the same.

Access to your data will be via accessors. You'll manually construct your members in the buffer.

You might start with a list of pairs of "tags" and data types.

struct tag1_t {} tag1;
struct tag2_t {} tag2;
typedef std::tuple< std::pair< tag1_t, int >, std::pair<tag2_t, double> > header_t;

then, some more types that we'll interpret as saying "after the header part, we have an array". I'd want to massively improve this syntax, but the important part for now is to build up compile time lists:

struct arr_t {} arr;
std::tuple< header_t, std::pair< arr_t, std::string > > full_t;

You'd then have to write up some template mojo that figures out, given N at run time, how big a buffer you'd need to store the int and double followed by N copies of the std::string, everything properly aligned. This isn't easy.

Once you've done that, you'd also need to write code that constructs everything described above. If you wanted to get fancy, you'd even expose a perfect forwarding constructor and constructor wrappers allowing the objects to be constructed in a non-default state.

Finally, write up an interface that finds the memory offset of the constructed objects based on the tags I injected into the above tuples, reinterpret_casts the raw memory into a reference to the data type, and returns that reference (in both const and non-const versions).

For the array at the end, you'd return some temporary data structure that has overloaded operator[] which produces the references.

If you take a look at how std::vector turns blocks of memory into arrays, and mix that with how boost::mpl arranges tag-to-data maps, and then also mess manually arround with keeping things properly aligned, every step is challenging but not impossible. The messy syntax I've used here can also be improved (to some extent).

The end interface might be

Foo* my_data = Foo::Create(7);
my_data->get<tag1_t>(); // returns an int
my_data->get<tag2_t>(); // returns a double
my_data->get<arr_t>()[3]; // access to 3rd one

which could be improved with some overloading to:

Foo* my_data = Foo::Create(7);
int x = my_data^tag1; // returns an int
double y = my_data^tag2; // returns a double
std::string z = my_data^arr[3]; // access to 3rd std::string

but the effort involved would be reasonably large to get this far, and many of the things required would be pretty horrible.

Basically, in order to solve your problem as described, I would have to rebuild the entire C++/C structure-layout system manually within C++, and once you have done that it isn't hard to inject "arbitrary length array at the end". It would even be possible to inject arbitrary length arrays in the middle (but that would mean that finding the address of structure members past that array is a runtime problem: however, as our operator^ is allowed to run arbitrary code, and your structure can store the length of arrays, we are able to do this).

I cannot, however, think of a simpler, portable way to do what you ask within C++, where the data types stored do not have to be standard-layout.

回答2:

With a little typecasting, you can use the C pattern in C++ as well.

Just make the arrays initial size one, and allocate the structure pointer using new char[...]:

struct Foo {
    int var1;
    int var2[1];
};

Foo* foo_ptr = reinterpret_cast<Foo*>(new char[sizeof(Foo) + sizeof(int) * (n - 1)]);

Then you of course should cast it when freeing the structure as well:

delete[] reinterpret_cast<char*>(foo_ptr);

I don't really recommend this for general use though. The only acceptable (to me) place to use a scheme such as this is when transferring a structure somehow (network or files). And then I recommend marshaling it to/from a "proper" C++ object with a std::vector for the variable-length data.

回答3:

What you want to do is not possible at all in C++. The reason is that sizeof(T) is compile-time constant, so placing an array inside a type makes it have compile-time size. So proper c++ way of doing it keeps the array outside of types. Note that placing array to stack is only possible if it's inside some type. So everything stack-based is limited to compile-time size of the array. (alloca might fix that). Your original C version also had similar problem, that types cannot deal with runtime sized arrays.

This is also the deal with variable-length arrays in C++. Not supported since it breaks sizeof and c++ classes rely on sizeof for data member access. Any solution that cannot be used together with c++ classes is no good. std::vector has no such problems.

Note that constexpr in c++11 makes offset calculation in your custom data types considerably simpler - the compile-time restriction is still there.

回答4:

I know I'm kinda late here, but my sugestion would be:

template<size_t N>
struct Foo {
    int var1;
    std::array<int,N> var2;
};

std::array stores the data as int v[N]; (not in the heap) so there would not be a problem converting it to streams of bytes

回答5:

I'm also kinda late, but this solution is compatible with C's flexible arrays (if you play with preprocessor of course) :

#include <cstdlib>
#include <iostream>

using namespace std;

template <typename T>
class Flexible 
{
public:
   Flexible(){}
   ~Flexible(){}
   inline T & operator[](size_t ind){
      return reinterpret_cast<T*>(this)[ind];
   }
   inline Flexible<T> * getThis() { return this; }
   inline operator T * () { return reinterpret_cast<T*>(this); }
};

struct test{
   int a;
   Flexible<char> b;
};

int main(int argc, char * argv[]){
   cout << sizeof(test) << endl;
   test t;
   cout << &t << endl;
   cout << &t.a << endl;
   cout << &t.b << endl;
   cout << t.b.getThis() << endl;
   cout << (void*)t.b << endl;
   test * t2 = static_cast<test*>(malloc(sizeof(test) + 5));
   t2->b[0] = 'a';
   t2->b[1] = 'b';
   t2->b[2] = 0;
   cout << t2->b << endl;
   return 0;
}

(tested on GCC, and clang with clang++ -fsanitize=undefined, I see no reason it wouldn't be standard, except the reinterpret_cast part...)

NOTE : you won't get an error if it's not the last field of the struct. Be particulary cautious about using this in objects containing this struct as sub-sub-...-sub-member, because you could add unintentionally another field after and get some weird bugs. For example, I would not advise defining a struct/class with a member which itself contain a Flexible, such as this one :

class A{
  Flexible<char> a;
};

class B{
  A a;
};

Because it's easy to do this mistake after :

class B{
  A a;
  int i;
};

来源：https://stackoverflow.com/questions/17424731/implementing-flexible-array-members-with-templates-and-base-class

标签

c++

flexible-array-member