How does boost::serialization allocate memory when deserializing through a pointer?

假如想象 提交于 2021-01-28 11:51:08

问题


In short, I'd like to know how boost::serialization allocates memory for an object when deserializing through a pointer. Below, you'll find an example of my question, clearly illustrated alongside companion code. This code should be fully functional and compile fine, there are no errors, per se, just a question on how the code actually works.

#include <cstddef> // NULL
#include <iomanip>
#include <iostream>
#include <fstream>
#include <string>

#include <boost/archive/text_iarchive.hpp>
#include <boost/archive/text_oarchive.hpp>

class non_default_constructor; // Forward declaration for boost serialization namespacing below


// In order to "teach" boost how to save and load your class with a non-default-constructor, you must override these functions
// in the boost::serialization namespace. Prototype them here.
namespace boost { namespace serialization {
    template<class Archive>
    inline void save_construct_data(Archive& ar, const non_default_constructor* ndc, const unsigned int version);
    template<class Archive>
    inline void load_construct_data(Archive& ar, non_default_constructor* ndc, const unsigned int version);
}}

// Here is the actual class definition with no default constructor
class non_default_constructor
{
public:
    explicit non_default_constructor(std::string initial)
    : some_initial_value{initial}, state{0}
    {

    }

    std::string get_initial_value() const { return some_initial_value; } // For save_construct_data

private:
    std::string some_initial_value;
    int state;

    // Notice that we only serialize state here, not the
    // some_initial_value passed into the ctor
    friend class boost::serialization::access;
    template<class Archive>
    void serialize(Archive& ar, const unsigned int version)
    {
        std::cout << "serialize called" << std::endl;
        ar & state;
    }
};

// Define the save and load overides here.
namespace boost { namespace serialization {
    template<class Archive>
    inline void save_construct_data(Archive& ar, const non_default_constructor* ndc, const unsigned int version)
    {
        std::cout << "save_construct_data called." << std::endl;
        ar << ndc->get_initial_value();
    }
    template<class Archive>
    inline void load_construct_data(Archive& ar, non_default_constructor* ndc, const unsigned int version)
    {
        std::cout << "load_construct_data called." << std::endl;
        std::string some_initial_value;
        ar >> some_initial_value;

        // Use placement new to construct a non_default_constructor class at the address of ndc
        ::new(ndc)non_default_constructor(some_initial_value);
    }
}}


int main(int argc, char *argv[])
{

    // Now lets say that we want to save and load a non_default_constructor class through a pointer.

    non_default_constructor* my_non_default_constructor = new non_default_constructor{"initial value"};

    std::ofstream outputStream("non_default_constructor.dat");
    boost::archive::text_oarchive outputArchive(outputStream);
    outputArchive << my_non_default_constructor;

    outputStream.close();

    // The above is all fine and dandy. We've serialized an object through a pointer.
    // non_default_constructor will call save_construct_data then will call serialize()

    // The output archive file will look exactly like this:

    /*
        22 serialization::archive 17 0 1 0
        0 13 initial value 0
    */


    /*If I want to load that class back into an object at a later time
    I'd declare a pointer to a non_default_constructor */
    non_default_constructor* load_from_archive;

    // Notice load_from_archive was not initialized with any value. It doesn't make
    // sense to intialize it with a value, because we're trying to load from
    // a file, not create a whole new object with "new".

    std::ifstream inputStream("non_default_constructor.dat");
    boost::archive::text_iarchive inputArchive(inputStream);

    // <><><> HERE IS WHERE I'M CONFUSED <><><>
    inputArchive >> load_from_archive;

    // The above should call load_construct_data which will attempt to
    // construct a non_default_constructor object at the address of
    // load_from_archive, but HOW DOES IT KNOW HOW MUCH MEMORY A NON_DEFAULT_CONSTRUCTOR
    // class uses?? Placement new just constructs at the address, assuming
    // memory at the passed address has been allocated for construction.

    // So my question is this:
    // I want to verify that *something* is (or isn't) allocating memory for a non_default_constructor
    // class to be constructed at the address of load_from_archive.

    std::cout << load_from_archive->get_initial_value() << std::endl; // This works.

    return 0;

}

Per the boost::serialization documentation when a class with a non-default constructor is to be (de)serialized, the load/save_construct_data is used, but I'm not actually seeing a place where memory is being allocated for the object to be loaded into, just where placement new is constructing an object at a memory address. But what allocated the memory at that address?

It's probably a misunderstanding with how this line works:

::new(ndc)non_default_constructor(some_initial_value);

but I'd like to know where my misunderstanding lies. This is my first question, so I apologize if I've made some sort of mistake on how I've asked my question. Thanks for your time.


回答1:


That's one excellent example program, with very apt comments. Let's dig in.

// In order to "teach" boost how to save and load your class with a
// non-default-constructor, you must override these functions in the
// boost::serialization namespace. Prototype them here.

You don't have to. Any overload (not override) accessible via ADL suffices, apart from the in-class option.

Skipping to the meat of it:

// So my question is this: I want to verify that *something* is (or isn't)
// allocating memory for a non_default_constructor
// class to be constructed at the address of load_from_archive.

Yes. The documentation states this. But it's a little bit trickier, because it's conditional. The reason is object tracking. Say, we serialize multiple pointers to the same object, they will get serialized once.

On deserialization, the objects will be represented in the archive stream with the object tracking-id. Only the first instance will lead to allocation.

See documentation.


Here's a simplified counter-example:

  • demonstrating ADL
  • demonstrating Object Tracking
  • removing all forward declarations (they're unnecessary due to template POI)

It serializes a vector with 10 copies of the pointer. I used unique_ptr to avoid leaking the instances (both the one manually created in main, as well as the one created by the deserialization).

Live On Coliru

#include <iomanip>
#include <iostream>
#include <fstream>

#include <boost/archive/text_iarchive.hpp>
#include <boost/archive/text_oarchive.hpp>
#include <boost/serialization/vector.hpp>

namespace mylib {
    // Here is the actual class definition with no default constructor
    class non_default_constructor {
      public:
        explicit non_default_constructor(std::string initial)
                : some_initial_value{ initial }, state{ 0 } {}

        std::string get_initial_value() const {
            return some_initial_value;
        } // For save_construct_data

      private:
        std::string some_initial_value;
        int state;

        // Notice that we only serialize state here, not the some_initial_value
        // passed into the ctor
        friend class boost::serialization::access;
        template <class Archive> void serialize(Archive& ar, unsigned) {
            std::cout << "serialize called" << std::endl;
            ar& state;
        }
    };

    // Define the save and load overides here.
    template<class Archive>
    inline void save_construct_data(Archive& ar, const non_default_constructor* ndc, unsigned)
    {
        std::cout << "save_construct_data called." << std::endl;
        ar << ndc->get_initial_value();
    }
    template<class Archive>
    inline void load_construct_data(Archive& ar, non_default_constructor* ndc, unsigned)
    {
        std::cout << "load_construct_data called." << std::endl;
        std::string some_initial_value;
        ar >> some_initial_value;

        // Use placement new to construct a non_default_constructor class at the address of ndc
        ::new(ndc)non_default_constructor(some_initial_value);
    }
}

int main() {
    using NDC = mylib::non_default_constructor;
    auto owned = std::make_unique<NDC>("initial value");

    {
        std::ofstream outputStream("vector.dat");
        boost::archive::text_oarchive outputArchive(outputStream);

        // serialize 10 copues, for fun
        std::vector v(10, owned.get());
        outputArchive << v;
    }

    /*
        22 serialization::archive 17 0 0 10 0 1 1 0
        0 13 initial value 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
    */

    std::vector<NDC*> restore;

    {
        std::ifstream inputStream("vector.dat");
        boost::archive::text_iarchive inputArchive(inputStream);

        inputArchive >> restore;
    }

    std::unique_ptr<NDC> take_ownership(restore.front());
    for (auto& el : restore) {
        assert(el == take_ownership.get());
    }

    std::cout << "restored: " << restore.size() << " copies with " << 
        std::quoted(take_ownership->get_initial_value()) << "\n";
}

Prints

save_construct_data called.
serialize called
load_construct_data called.
serialize called
restored: 10 copies with "initial value"

The vector.dat file contains:

22 serialization::archive 17 0 0 10 0 1 1 0
0 13 initial value 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0

The Library Internals

You shouldn't really care, but you can of course read the source code. Predictably, it's way more involved than you'd naively expect, after all: this is C++.

The library deals with types that have overloaded operator new. In that case it calls T::operator new instead of the globale operator new. It always passes sizeof(T) as you correctly surmised.

The code lives in the exception-safe wrapper: detail/iserializer.hpp

struct heap_allocation {
    explicit heap_allocation() { m_p = invoke_new(); }
    ~heap_allocation() {
        if (0 != m_p)
            invoke_delete(m_p);
    }
    T* get() const { return m_p; }

    T* release() {
        T* p = m_p;
        m_p = 0;
        return p;
    }

  private:
    T* m_p;
};

Yes, this code be simplified a lot with C++11 or later. Also, the NULL-guard in the destructor is redunant for compliant implementations of operator delete.

Now of course, invoke_new and invoke_delete are where it's at. Presenting condensed:

    static T* invoke_new() {
        typedef typename mpl::eval_if<boost::has_new_operator<T>,
                mpl::identity<has_new_operator>,
                mpl::identity<doesnt_have_new_operator>>::type typex;
        return typex::invoke_new();
    }
    static void invoke_delete(T* t) {
        typedef typename mpl::eval_if<boost::has_new_operator<T>,
                mpl::identity<has_new_operator>,
                mpl::identity<doesnt_have_new_operator>>::type typex;
        typex::invoke_delete(t);
    }
    struct has_new_operator {
        static T* invoke_new() { return static_cast<T*>((T::operator new)(sizeof(T))); }
        static void invoke_delete(T* t) { (operator delete)(t); }
    };
    struct doesnt_have_new_operator {
        static T* invoke_new() { return static_cast<T*>(operator new(sizeof(T))); }
        static void invoke_delete(T* t) { (operator delete)(t); }
    };

There's some conditional compilation and verbose comments, so per-use the source code if you want the full picture.



来源:https://stackoverflow.com/questions/62105624/how-does-boostserialization-allocate-memory-when-deserializing-through-a-point

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!