Boost Spirit QI slow

前端 未结 3 419
终归单人心
终归单人心 2020-12-19 06:50

I try to parse TPCH files with Boost Spirit QI. My implementation inspired by the employee example of Spirit QI ( http://www.boost.org/doc/libs/1_52_0/libs/spirit/example/qi

3条回答
  •  醉话见心
    2020-12-19 07:42

    The problem mainly comes from appending individual char elements to std::string container. According to your grammar, for each std::string attribute the allocation starts when a char is met and stops when you find a | separator. So, at first there are sizeof(char)+1 reserved bytes (null-terminated "\0"). The compiler will have to run the allocator of std::string depending on the allocators doubling algorithm! That means the memory has to be re-allocated very frequently for small strings. This means your string is copied to a memory allocation double it's size and the previous allocation is freed, at intervals of 1,2,4,6,12,24... characters. No wonder it was slow, this causes huge problems with the frequent malloc calls; more heap fragmentation, a bigger linked list of free memory blocks, variable (small) sizes of those memory blocks which at it's turn causes issues with longer scanning of memory for the application's allocations during it's entire lifetime. tldr; the data becomes fragmented and widely dispersed in the memory.

    Proof? The following code is called by the char_parser each time a valid character is met in your Iterator. From Boost 1.54

    /boost/spirit/home/qi/char/char_parser.hpp

    if (first != last && this->derived().test(*first, context))
    {
        spirit::traits::assign_to(*first, attr_);
        ++first;
        return true;
    }
    return false;
    

    /boost/spirit/home/qi/detail/assign_to.hpp

    // T is not a container and not a string
    template 
    static void call(T_ const& val, Attribute& attr, mpl::false_, mpl::false_)
    {
        traits::push_back(attr, val);
    }
    

    /boost/spirit/home/support/container.hpp

    template 
    struct push_back_container
    {
        static bool call(Container& c, T const& val)
        {
            c.insert(c.end(), val);
            return true;
        }
    };
    

    The correction follow-up code you posted (changing your struct to char Name[Size]) is basically the same as adding a string Name.reserve(Size) statement directive. However, there's no directive for this at the moment.

    The Solution:

    /boost/spirit/home/support/container.hpp

    template 
    struct push_back_container
    {
        static bool call(Container& c, T const& val, size_t initial_size = 8)
        {
            if (c.capacity() < initial_size)
                c.reserve(initial_size);
            c.insert(c.end(), val);
            return true;
        }
    };
    

    /boost/spirit/home/qi/char/char_parser.hpp

    if (first != last && this->derived().test(*first, context))
    {
        spirit::traits::assign_to(*first, attr_);
        ++first;
        return true;
    }
    if (traits::is_container::value == true)
        attr_.shrink_to_fit();
    return false;
    

    I haven't tested it but I assume it can speed up char parsers over string attributes by over 10x like you saw. It would be a great optimization feature in a Boost Spirit update, including a reserve(initial_size)[ +( char_ - lit("|") ) ] directive that sets the initial buffer size.

提交回复
热议问题