Object layout in case of virtual functions and multiple inheritance

前端 未结 4 699
醉酒成梦
醉酒成梦 2020-11-30 00:07

I was recently asked in an interview about object layout with virtual functions and multiple inheritance involved.
I explained it in context of how it is implemented wit

4条回答
  •  情深已故
    2020-11-30 00:47

    First, a polymorphic class has at least one virtual function, so it has a vptr:

    struct A {
        virtual void foo();
    };
    

    is compiled to:

    struct A__vtable { // vtable for objects of declared type A
        void (*foo__ptr) (A *__this); // pointer to foo() virtual function
    };
    
    void A__foo (A *__this); // A::foo ()
    
    // vtable for objects of real (dynamic) type A
    const A__vtable A__real = { // vtable is never modified
        /*foo__ptr =*/ A__foo
    };
    
    struct A {
        A__vtable const *__vptr; // ptr to const not const ptr
                                 // vptr is modified at runtime
    };
    
    // default constructor for class A (implicitly declared)
    void A__ctor (A *__that) { 
        __that->__vptr = &A__real;
    }
    

    Remark: C++ can be compiled to another high level language like C (as cfront did) or even to a C++ subset (here C++ without virtual). I put __ in compiler generated names.

    Note that this is a simplistic model where RTTI is not supported; real compilers will add data in the vtable to support typeid.

    Now, a simple derived class:

    struct Der : A {
        override void foo();
        virtual void bar();
    };
    

    Non-virtual (*) base class subobjects are subobjects like member subobjects, but while member subobjects are complete objects, ie. their real (dynamic) type is their declared type, base class subobjects are not complete, and their real type change during construction.

    (*) virtual bases are very different, like virtual member functions are different from non virtual members

    struct Der__vtable { // vtable for objects of declared type Der
        A__vtable __primary_base; // first position
        void (*bar__ptr) (Der *__this); 
    };
    
    // overriding of a virtual function in A:
    void Der__foo (A *__this); // Der::foo ()
    
    // new virtual function in Der:
    void Der__bar (Der *__this); // Der::bar ()
    
    // vtable for objects of real (dynamic) type Der
    const Der__vtable Der__real = { 
        { /*foo__ptr =*/ Der__foo },
        /*foo__ptr =*/ Der__bar
    };
    
    struct Der { // no additional vptr
        A __primary_base; // first position
    };
    

    Here "first position" means that the member must be first (other members could be re-ordered): they are located at offset zero so we can reinterpret_cast pointers, the types are compatibles; at non-zero offset, we would have to do pointer adjustments with arithmetic on char*.

    The lack of adjustment may not seem like a big deal in term of generated code (just some add immediate asm instructions), but it means much more than that, it means that such pointers can be viewed as having different types: an object of type A__vtable* can contain a pointer to Der__vtable and be treated as either a Der__vtable* or a A__vtable*. The same pointer object serves as a pointer to a A__vtable in functions dealing with objects of type A and as a pointer to a Der__vtable in functions dealing with objects of type Der.

    // default constructor for class Der (implicitly declared)
    void Der__ctor (Der *__this) { 
        A__ctor (reinterpret_cast (__this));
        __this->__vptr = reinterpret_cast (&Der__real);
    }
    

    You see that the dynamic type, as defined by the vptr, changes during construction as we assign a new value to the vptr (in this particular case the call to the base class constructor does nothing useful and can be optimised away, but it isn't the case with non trivial constructors).

    With multiple inheritance:

    struct C : A, B {};
    

    A C instance will contain a A and a B, like that:

    struct C {
        A base__A; // primary base
        B base__B;
    };
    

    Note that only one of these base class subobjects can have the privilege of sitting at offset zero; this is important in many ways:

    • conversion of pointers to other base classes (upcasts) will need an adjustment; conversely, upcasts need the opposite adjustments;

    • this implies that when doing a virtual call with a base class pointer, the this has the correct value for entry in the derived class overrider.

    So the following code:

    void B::printaddr() {
        printf ("%p", this);
    }
    
    void C::printaddr () { // overrides B::printaddr()
        printf ("%p", this);
    }
    

    can be compiled to

    void B__printaddr (B *__this) {
        printf ("%p", __this);
    }
    
    // proper C::printaddr taking a this of type C* (new vtable entry in C)
    void C__printaddr (C *__this) {
        printf ("%p", __this);
    }
    
    // C::printaddr overrider for B::printaddr
    // needed for compatibility in vtable
    void C__B__printaddr (B *__this) {
        C__printaddr (reinterpret_cast(reinterpret_cast (__this) - offset__C__B));
    }
    

    We see the C__B__printaddr declared type and semantics are compatibles with B__printaddr, so we can use &C__B__printaddr in the vtable of B; C__printaddr is not compatible but can be used for calls involving a C objects, or classes derived from C.

    A non virtual member function is like a free function that has access to internal stuff. A virtual member function is "flexibility point" which can be customised by overriding. virtual member function declaration play a special role in the definition of a class: like other members they are part of the contract with the external world, but at the same time they are part of a contract with derived class.

    A non virtual base class is like a member object where we can refine behaviour via overriding (also we can access protected members). For the external world, the inheritance for A in Der implies that implicit derived-to-base conversions will exist for pointers, that a A& can be bound to a Der lvalue, etc. For further derived classes (derived from Der), it also means that virtual functions of A are inherited in the Der: virtual functions in A can be overridden in further derived classes.

    When a class is further derived, say Der2 is derived from Der, implicit conversions a pointers of type Der2* to A* is semantically performed in step: first, a conversion to Der* is validated (the access control to the inheritance relation of Der2 from Der is checked with the usual public/protected/private/friend rules), then the access control of Der to A. A non virtual inheritance relation cannot be refined or overridden in derived classes.

    Non virtual members functions can be directly called and virtual members must be called indirectly via the vtable (unless the real object type happens to be known by the compiler), so the virtual keyword adds an indirection to members functions access. Just like for function members, the virtual keyword adds an indirection to base object access; just like for functions, virtual base classes add a flexibility point in inheritance.

    When doing non-virtual, repeated, multiple inheritance :

    struct Top { int i; };
    struct Left : Top { };
    struct Right : Top { };
    struct Bottom : Left, Right { };
    

    There are just two Top::i subobjects in Bottom (Left::i and Right::i), as with members objects:

    struct Top { int i; };
    struct mLeft { Top t; };
    struct mRight { mTop t; };
    struct mBottom { mLeft l; mRight r; }
    

    Nobody is surprised that there are two int sub-members (l.t.i and r.t.i).

    With virtual functions:

    struct Top { virtual void foo(); };
    struct Left : Top { }; // could override foo
    struct Right : Top { }; // could override foo
    struct Bottom : Left, Right { }; // could override foo (both)
    

    it means there are two different (unrelated) virtual functions called foo, with distinct vtable entries (both as they have the same signature, they can have a common overrider).

    The semantic of non virtual base classes follows from the fact that basic, non virtual, inheritance is an exclusive relation: the inheritance relation established between Left and Top cannot be modified by a further derivation, so the fact that a similar relation exists between Right and Top cannot affect this relation. In particular, it means that Left::Top::foo() can be overridden in Left and in Bottom, but Right, which has no inheritance relation with Left::Top, cannot set this customisation point.

    Virtual base classes are different: a virtual inheritance is a shared relation that can be customised in derived classes:

    struct Top { int i; virtual void foo(); };
    struct vLeft : virtual Top { }; 
    struct vRight : virtual Top { };
    struct vBottom : vLeft, vRight { }; 
    

    Here, this is only one base class subobject Top, only one int member.

    Implementation:

    Room for non virtual base classes are allocated based on a static layout with fixed offsets in the derived class. Note that the layout of a derived class is the included in the layout of more derived class, so the exact position of subobjects does not depend on the real (dynamic) type of object (just like the address of a non virtual function is a constant). OTOH, the position of subobjects in a class with virtual inheritance is determined by the dynamic type (just like the address of the implementation of a virtual function is known only when the dynamic type is known).

    The location of subobject will be determined at runtime with the vptr and the vtable (reuse of the existing vptr implies less space overhead), or a direct internal pointer to the subobject (more overhead, less indirections needed).

    Because the offset of a virtual base class is determined only for a complete object, and cannot be known for a given declared type, a virtual base cannot be allocated at offset zero and is never a primary base. A derived class will never reuse the vptr of a virtual base as its own vptr.

    In term of possible translation:

    struct vLeft__vtable { 
        int Top__offset; // relative vLeft-Top offset
        void (*foo__ptr) (vLeft *__this); 
        // additional virtual member function go here
    };
    
    // this is what a subobject of type vLeft looks like
    struct vLeft__subobject { 
        vLeft__vtable const *__vptr;
        // data members go here
    };
    
    void vLeft__subobject__ctor (vLeft__subobject *__this) { 
        // initialise data members
    }
    
    // this is a complete object of type vLeft 
    struct vLeft__complete {
        vLeft__subobject __sub;
        Top Top__base;
    }; 
    
    // non virtual calls to vLeft::foo
    void vLeft__real__foo (vLeft__complete *__this);
    
    // virtual function implementation: call via base class
    // layout is vLeft__complete 
    void Top__in__vLeft__foo (Top *__this) {
        // inverse .Top__base member access 
        char *cp = reinterpret_cast (__this);
        cp -= offsetof (vLeft__complete,Top__base);
        vLeft__complete *__real = reinterpret_cast (cp);
        vLeft__real__foo (__real);
    }
    
    void vLeft__foo (vLeft *__this) {
        vLeft__real__foo (reinterpret_cast (__this));
    }
    
    // Top vtable for objects of real type vLeft
    const Top__vtable Top__in__vLeft__real = { 
        /*foo__ptr =*/ Top__in__vLeft__foo 
    };
    
    // vLeft vtable for objects of real type vLeft
    const vLeft__vtable vLeft__real = { 
        /*Top__offset=*/ offsetof(vLeft__complete, Top__base),
        /*foo__ptr =*/ vLeft__foo 
    };
    
    void vLeft__complete__ctor (vLeft__complete *__this) { 
        // construct virtual bases first
        Top__ctor (&__this->Top__base); 
    
        // construct non virtual bases: 
        // change dynamic type to vLeft
        // adjust both virtual base class vptr and current vptr
        __this->Top__base.__vptr = &Top__in__vLeft__real;
        __this->__vptr = &vLeft__real;
    
        vLeft__subobject__ctor (&__this->__sub);
    }
    

    For an object of known type, access to the base class is through vLeft__complete:

    struct a_vLeft {
        vLeft m;
    };
    
    void f(a_vLeft &r) {
        Top &t = r.m; // upcast
        printf ("%p", &t);
    }
    

    is translated to:

    struct a_vLeft {
        vLeft__complete m;
    };
    
    void f(a_vLeft &r) {
        Top &t = r.m.Top__base;
        printf ("%p", &t);
    }
    

    Here the real (dynamic) type of r.m is known and so is the relative position of the subobject is known at compile time. But here:

    void f(vLeft &r) {
        Top &t = r; // upcast
        printf ("%p", &t);
    }
    

    the real (dynamic) type of r is not known, so the access is through the vptr:

    void f(vLeft &r) {
        int off = r.__vptr->Top__offset;
        char *p = reinterpret_cast (&r) + off;
        printf ("%p", p);
    }
    

    This function can accept any derived class with a different layout:

    // this is what a subobject of type vBottom looks like
    struct vBottom__subobject { 
        vLeft__subobject vLeft__base; // primary base
        vRight__subobject vRight__base; 
        // data members go here
    };
    
    // this is a complete object of type vBottom 
    struct vBottom__complete {
        vBottom__subobject __sub; 
        // virtual base classes follow:
        Top Top__base;
    }; 
    

    Note that the vLeft base class is at a fixed location in a vBottom__subobject, so vBottom__subobject.__ptr is used as a vptr for the whole vBottom.

    Semantics:

    The inheritance relation is shared by all derived classes; this means that the right to override is shared, so vRight can override vLeft::foo. This creates a sharing of responsibilities: vLeft and vRight must agree on how they customise Top:

    struct Top { virtual void foo(); };
    struct vLeft : virtual Top { 
        override void foo(); // I want to customise Top
    }; 
    struct vRight : virtual Top { 
        override void foo(); // I want to customise Top
    }; 
    struct vBottom : vLeft, vRight { };  // error
    

    Here we see a conflict: vLeft and vRight seek to define the behaviour of the only foo virtual function, and vBottom definition is in error for lack of a common overrider.

    struct vBottom : vLeft, vRight  { 
        override void foo(); // reconcile vLeft and vRight 
                             // with a common overrider
    };
    

    Implementation:

    The construction of class with non virtual base classes with non virtual base classes involves calling base class constructors in the same order as done for member variables, changing the dynamic type each time we enter a ctor. During construction, the base class subobjects really act as if they were complete objects (this is even true with impossible complete abstract base class subobjects: they are objects with undefined (pure) virtual functions). Virtual functions and RTTI can be called during construction (except of course pure virtual functions).

    The construction of a class with non virtual base classes with virtual bases is more complicated: during construction, the dynamic type is the base class type, but the layout of virtual base is still the layout of the most derived type that is not yet constructed, so we need more vtables to describe this state:

    // vtable for construction of vLeft subobject of future type vBottom
    const vLeft__vtable vLeft__ctor__vBottom = { 
        /*Top__offset=*/ offsetof(vBottom__complete, Top__base),
        /*foo__ptr =*/ vLeft__foo 
    };
    

    The virtual functions are those of vLeft (during construction, the vBottom object lifetime has not begun), while the virtual base locations are those of a vBottom (as defined in the vBottom__complete translated objected).

    Semantics:

    During initialisation, it is obvious that we must be careful not to use an object before it is initialised. Because C++ gives us a name before an object is fully initialised, it is easy to do that:

    int foo (int *p) { return *pi; }
    int i = foo(&i); 
    

    or with the this pointer in the constructor:

    struct silly { 
        int i;
        std::string s;
        static int foo (bad *p) { 
            p->s.empty(); // s is not even constructed!
            return p->i; // i is not set!
        }
        silly () : i(foo(this)) { }
    };
    

    It is pretty obvious that any use of this in the ctor-init-list must be carefully checked. After initialisation of all members, this can be passed to other functions and registered in some set (until destruction begins).

    What is less obvious is that when construction of a class involving shared virtual bases, subobjects stop being constructed: during the construction of a vBottom:

    • first the virtual bases are constructed: when Top is constructed, it is constructed like a normal subject (Top doesn't even know it is virtual base)

    • then the base classes are constructed in left to right order: the vLeft subobject is constructed and becomes functional as a normal vLeft (but with a vBottom layout), so the Top base class subobject now has a vLeft dynamic type;

    • the vRight subobject construction begins, and the dynamic type of the base class changes to vRight; but vRight isn't derived from vLeft, doesn't know anything about vLeft, so the vLeft base is now broken;

    • when the body of the Bottom constructor begins, the types of all subobjects have stabilised and vLeft is functional again.

提交回复
热议问题