I try to parse TPCH files with Boost Spirit QI. My implementation inspired by the employee example of Spirit QI ( http://www.boost.org/doc/libs/1_52_0/libs/spirit/example/qi
The problem mainly comes from appending individual char elements to std::string container. According to your grammar, for each std::string attribute the allocation starts when a char is met and stops when you find a | separator. So, at first there are sizeof(char)+1 reserved bytes (null-terminated "\0"). The compiler will have to run the allocator of std::string depending on the allocators doubling algorithm! That means the memory has to be re-allocated very frequently for small strings. This means your string is copied to a memory allocation double it's size and the previous allocation is freed, at intervals of 1,2,4,6,12,24... characters. No wonder it was slow, this causes huge problems with the frequent malloc calls; more heap fragmentation, a bigger linked list of free memory blocks, variable (small) sizes of those memory blocks which at it's turn causes issues with longer scanning of memory for the application's allocations during it's entire lifetime. tldr; the data becomes fragmented and widely dispersed in the memory.
Proof? The following code is called by the char_parser each time a valid character is met in your Iterator. From Boost 1.54
/boost/spirit/home/qi/char/char_parser.hpp
if (first != last && this->derived().test(*first, context))
{
spirit::traits::assign_to(*first, attr_);
++first;
return true;
}
return false;
/boost/spirit/home/qi/detail/assign_to.hpp
// T is not a container and not a string
template
static void call(T_ const& val, Attribute& attr, mpl::false_, mpl::false_)
{
traits::push_back(attr, val);
}
/boost/spirit/home/support/container.hpp
template
struct push_back_container
{
static bool call(Container& c, T const& val)
{
c.insert(c.end(), val);
return true;
}
};
The correction follow-up code you posted (changing your struct to char Name[Size]) is basically the same as adding a string Name.reserve(Size) statement directive. However, there's no directive for this at the moment.
The Solution:
/boost/spirit/home/support/container.hpp
template
struct push_back_container
{
static bool call(Container& c, T const& val, size_t initial_size = 8)
{
if (c.capacity() < initial_size)
c.reserve(initial_size);
c.insert(c.end(), val);
return true;
}
};
/boost/spirit/home/qi/char/char_parser.hpp
if (first != last && this->derived().test(*first, context))
{
spirit::traits::assign_to(*first, attr_);
++first;
return true;
}
if (traits::is_container::value == true)
attr_.shrink_to_fit();
return false;
I haven't tested it but I assume it can speed up char parsers over string attributes by over 10x like you saw. It would be a great optimization feature in a Boost Spirit update, including a reserve(initial_size)[ +( char_ - lit("|") ) ] directive that sets the initial buffer size.