How to improve performance of boost interval_map lookups

后端 未结 2 1526
旧时难觅i
旧时难觅i 2021-01-03 11:02

I\'m using a boost::icl::interval_map to map byte ranges to a set of strings. The map is loaded from a (sorted) disk file, and then I do lookups using the code

2条回答
  •  难免孤独
    2021-01-03 11:51

    In this answer I present three optimizations:

    1. replacing the objects std::set by boost::container::flat_set for improved locality (and likely reallocation costs, since most object sets are <4 elements)

      UPDATE In my final version below, simply replacing boost::container::flat_map back with std::set degraded performance of just find_range by a factor ~2x and find_range_ex by a factor of ~4x on my test system

    2. replacing the object id std::string by string_atom (which is technically a char const* but logically unique). This technique is similar to interned strings in various VM implementations (like Java/.NET).

      NOTE: The current implementation of make_atom is exceedingly simplistic and never frees atoms. You would potentially want to back the strings in a deque, use Boost Flyweights, a pool allocator or some combination of those to make it smarter. However, whether this is required depends on your use cases

    3. replacing the map intersection with a equal_range call, which saves the bulk of time by avoiding copying (large amounts of) data

      _UPDATE When applying just this optimization in isolation the speed up is already 26~30x. Note that the memory usage is roughly double at ~20MiB compared to when including all three optimizations_

    Volume and data efficiency

    Before I start, I like to know what the data looks like. So, writing some code to parse that bmap.txt input, we get:

    Source On Coliru

    Parsed ok
    Histogram of 66425 input lines
    d: 3367
    f: 20613
    p: 21222
    v: 21223
    ranges size:            6442450944
    ranges iterative size:  21223
    Min object set:         1.000000
    Max object set:         234.000000
    Average object set:     3.129859
    Min interval width:     1024.000000
    Max interval width:     2526265344.000000
    Average interval width: 296.445177k
    First:                  [0,1048576)
    Last:                   [3916185600,6442450944)
    String atoms:           23904 unique in 66425 total
    Atom efficiency:        35.986451%
    

    As you can see the sets are usually ~3 items, and many are duplicated.

    Using the make_atom object naming method with boost::flat_set reduced memory allocation from ~15GiB to ~10Gib.

    This optimization also reduces string comparison to pointer comparison for set insertion and the Combiner strategy of the interval_map, so for larger data sets this has the potential to have a lot of speedup.

    Query efficiency

    Query performance is indeed severely crippled by the partial copy of the input.

    Don't copy, instead view the overlapping range, simply by replacing:

      const ranges r = *map & window;
      ranges::const_iterator iter = r.begin ();
      while (iter != r.end ()) {
    

    with

      auto r = map->equal_range(window);
      ranges::const_iterator iter = r.first;
      while (iter != r.second) {
    

    On my system running 10000 identical randomized queries with both versions results in a speedup of 32x:

    10000 'random' OLD lookups resulted in 156729884 callbacks in 29148ms
    10000 'random' NEW lookups resulted in 156729884 callbacks in 897ms
    
    real    0m31.715s
    user    0m31.664s
    sys 0m0.012s
    

    The runtime is now dominated by the parsing/statistics. Full benchmark code is here: On Coliru

    #define BOOST_RESULT_OF_USE_DECTYPE
    #define BOOST_SPIRIT_USE_PHOENIX_V3
    
    /* virt-bmap examiner plugin
     * Copyright (C) 2014 Red Hat Inc.
     *
     * This program is free software; you can redistribute it and/or modify
     * it under the terms of the GNU General Public License as published by
     * the Free Software Foundation; either version 2 of the License, or
     * (at your option) any later version.
     *
     * This program is distributed in the hope that it will be useful,
     * but WITHOUT ANY WARRANTY; without even the implied warranty of
     * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
     * GNU General Public License for more details.
     *
     * You should have received a copy of the GNU General Public License
     * along with this program; if not, write to the Free Software
     * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
     */
    
    #include 
    #include 
    #include 
    #include 
    #include 
    
    #include 
    #include 
    #include 
    #include 
    
    using namespace std;
    
    /* Maps intervals (uint64_t, uint64_t) to a set of strings, where each
     * string represents an object that covers that range.
     */
    
    static size_t atoms_requested = 0;
    static size_t atoms_unique_created = 0;
    
    using string_atom = const char*;
    string_atom make_atom(std::string&& s)
    {
        static std::set s_atoms;
        atoms_requested += 1;
    
        auto it = s_atoms.find(s);
        if (it != s_atoms.end())
            return it->c_str();
    
        atoms_unique_created += 1;
        return s_atoms.insert(std::move(s)).first->c_str();
    }
    
    typedef boost::container::flat_set objects;
    typedef boost::icl::interval_map ranges;
    
    ranges*
    new_ranges (void)
    {
      return new ranges ();
    }
    
    void
    free_ranges (ranges *mapv)
    {
      ranges *map = (ranges *) mapv;
      delete map;
    }
    
    extern "C" void
    insert_range (void *mapv, uint64_t start, uint64_t end, const char *object)
    {
      ranges *map = (ranges *) mapv;
      objects obj_set;
      obj_set.insert (obj_set.end(), object);
      map->add (std::make_pair (boost::icl::interval::right_open (start, end), // SEHE added std::
                           obj_set));
    }
    
    extern "C" void
    iter_range (void *mapv, void (*f) (uint64_t start, uint64_t end, const char *object, void *opaque), void *opaque)
    {
      ranges *map = (ranges *) mapv;
      ranges::iterator iter = map->begin ();
      while (iter != map->end ()) {
        boost::icl::interval::type range = iter->first;
        uint64_t start = range.lower ();
        uint64_t end = range.upper ();
    
        objects obj_set = iter->second;
        objects::iterator iter2 = obj_set.begin ();
        while (iter2 != obj_set.end ()) {
          f (start, end, *iter2/*->c_str ()*/, opaque); // SEHE
          iter2++;
        }
        iter++;
      }
    }
    
    extern "C" void
    find_range (void const *mapv, uint64_t start, uint64_t end, void (*f) (uint64_t start, uint64_t end, const char *object, void *opaque), void *opaque)
    {
      const ranges *map = (const ranges *) mapv;
    
      boost::icl::interval::type window;
      window = boost::icl::interval::right_open (start, end);
    
      const ranges r = *map & window;
    
      ranges::const_iterator iter = r.begin ();
      while (iter != r.end ()) {
        boost::icl::interval::type range = iter->first;
        uint64_t start = range.lower ();
        uint64_t end = range.upper ();
    
        objects obj_set = iter->second;
        objects::iterator iter2 = obj_set.begin ();
        while (iter2 != obj_set.end ()) {
          f (start, end, *iter2/*->c_str ()*/, opaque); // SEHE
          iter2++;
        }
        iter++;
      }
    }
    
    extern "C" void
    find_range_ex (void const *mapv, uint64_t start, uint64_t end, void (*f) (uint64_t start, uint64_t end, const char *object, void *opaque), void *opaque)
    {
      const ranges *map = (const ranges *) mapv;
    
      boost::icl::interval::type window;
      window = boost::icl::interval::right_open (start, end);
    
    #if 0
      const ranges r = *map & window;
      ranges::const_iterator iter = r.begin ();
      while (iter != r.end ()) {
    #else
      auto r = map->equal_range(window);
      ranges::const_iterator iter = r.first;
      while (iter != r.second) {
    #endif
    
        boost::icl::interval::type range = iter->first;
        uint64_t start = range.lower ();
        uint64_t end = range.upper ();
    
        objects obj_set = iter->second;
        objects::iterator iter2 = obj_set.begin ();
        while (iter2 != obj_set.end ()) {
          f (start, end, *iter2/*->c_str ()*/, opaque); // SEHE
          iter2++;
        }
        iter++;
      }
    }
    
    #include 
    #include 
    #include 
    #include 
    #include 
    #include 
    #include 
    
    std::map histo;
    
    bool insert_line_of_input(ranges& bmap_data, uint64_t b, uint64_t e, char type, std::string& object) {
        if (!object.empty())
            histo[type]++;
        //std::cout << std::hex << b << " " << e << " " << type << " " << object << "\n";
    
    #if 0
        object.insert(object.begin(), ':');
        object.insert(object.begin(), type);
    #endif
        insert_range(&bmap_data, b, e, make_atom(std::move(object)));
        return true;
    }
    
    std::vector > generate_test_queries(ranges const& bmap_data, size_t n) {
        std::vector > queries;
        queries.reserve(n);
    
        for (size_t i = 0; i < n; ++i)
        {
            auto start = (static_cast(rand()) * rand()) % bmap_data.size();
            auto end   = start + rand();
    
            queries.emplace_back(start,end);
        }
    
        return queries;
    }
    
    ranges read_mapfile(const char* fname) {
        std::ifstream ifs(fname);
        boost::spirit::istream_iterator f(ifs >> std::noskipws), l;
    
        ranges bmap_data;
    
        namespace phx = boost::phoenix;
        using namespace boost::spirit::qi;
        uint_parser offset;
        if (!phrase_parse(f,l,
                    ("1 " >> offset >> offset >> char_("pvdf") >> as_string[lexeme[+graph] >> attr('/') >> lexeme[*~char_("\r\n")]]) 
                    [ _pass = phx::bind(insert_line_of_input, phx::ref(bmap_data), _1, _2, _3, _4) ] % eol >> *eol, 
                    blank))
        {
            exit(255);
        } else
        {
            std::cout << "Parsed ok\n";
        }
    
        if (f!=l)
            std::cout << "Warning: remaining input '" << std::string(f,l) << "\n";
    
        return bmap_data;
    }
    
    void report_statistics(ranges const& bmap_data) {
        size_t total = 0;
        for (auto e : histo) total += e.second;
    
        std::cout << "Histogram of " << total << " input lines\n";
    
        for (auto e : histo)
            std::cout << e.first << ": " << e.second << "\n";
    
        namespace ba = boost::accumulators;
        ba::accumulator_set > 
            object_sets, interval_widths;
    
        for (auto const& r : bmap_data)
        {
            auto width = r.first.upper() - r.first.lower();
            assert(width % 1024 == 0);
    
            interval_widths(width);
            object_sets(r.second.size());
        }
    
        std::cout << std::fixed;
        std::cout << "ranges size:            " << bmap_data.size()                 << "\n";
        std::cout << "ranges iterative size:  " << bmap_data.iterative_size()       << "\n";
    
        std::cout << "Min object set:         " << ba::min(object_sets)             << "\n" ;
        std::cout << "Max object set:         " << ba::max(object_sets)             << "\n" ;
        std::cout << "Average object set:     " << ba::mean(object_sets)            << "\n" ;
        std::cout << "Min interval width:     " << ba::min(interval_widths)         << "\n" ;
        std::cout << "Max interval width:     " << ba::max(interval_widths)         << "\n" ;
        std::cout << "Average interval width: " << ba::mean(interval_widths)/1024.0 << "k\n" ;
        std::cout << "First:                  " << bmap_data.begin()->first         << "\n" ;
        std::cout << "Last:                   " << bmap_data.rbegin()->first        << "\n" ;
    
        std::cout << "String atoms:           " << atoms_unique_created << " unique in " << atoms_requested << " total\n";
        std::cout << "Atom efficiency:        " << (atoms_unique_created*100.0/atoms_requested) << "%\n";
    }
    
    void perform_comparative_benchmarks(ranges const& bmap_data, size_t number_of_queries) {
        srand(42);
        auto const queries = generate_test_queries(bmap_data, number_of_queries);
    
        using hrc = std::chrono::high_resolution_clock;
        {
            auto start = hrc::now();
            size_t callbacks = 0;
    
            for (auto const& q: queries)
            {
                find_range(&bmap_data, q.first, q.second, 
                        [](uint64_t start, uint64_t end, const char *object, void *opaque) {
                        ++(*static_cast(opaque));
                        }, &callbacks);
            }
            std::cout << number_of_queries << " 'random' OLD lookups resulted in " << callbacks 
                      << " callbacks in " << std::chrono::duration_cast((hrc::now()-start)).count() << "ms\n";
        }
    
        {
            auto start = hrc::now();
            size_t callbacks = 0;
    
            for (auto const& q: queries)
            {
                find_range_ex(&bmap_data, q.first, q.second, 
                        [](uint64_t start, uint64_t end, const char *object, void *opaque) {
                        ++(*static_cast(opaque));
                        }, &callbacks);
            }
            std::cout << number_of_queries << " 'random' NEW lookups resulted in " << callbacks 
                      << " callbacks in " << std::chrono::duration_cast((hrc::now()-start)).count() << "ms\n";
        }
    }
    
    int main() {
        auto bmap = read_mapfile("bmap.txt");
    
        report_statistics(bmap);
    
        perform_comparative_benchmarks(bmap, 1000);
    
    #if 0 // to dump ranges to console
        for (auto const& r : bmap)
        {
            std::cout << r.first << "\t" << r.second.size() << "\t";
            std::copy(r.second.begin(), r.second.end(), std::ostream_iterator(std::cout, "\t"));
            std::cout << "\n";
        }
    #endif
    }
    

提交回复
热议问题