Remove 100,000+ nodes from a Boost graph

前端 未结 3 1314
囚心锁ツ
囚心锁ツ 2020-12-03 16:18

I have a graph ( adjacency_list (listS, vecS, bidirectionalS, VertexVal) ) in which I need to delete 100,000+ nodes. Each node also contains a structure of 2 64-bit integers

相关标签:
3条回答
  • 2020-12-03 16:43

    It would be interesting to see more of the Vtune data.

    My experience has been that the default Microsoft allocator can be a big bottleneck when deleting tens of thousands of small objects. Does your Vtune graph show a lot of time in delete or free?

    If so, consider switching to a third-party allocator. Nedmalloc is said to be good: http://www.nedprod.com/programs/portable/nedmalloc/

    Google has one, tcmalloc, which is very well regarded and much faster than the built-in allocators on almost every platform. https://code.google.com/p/gperftools/ tcmalloc is not a drop-in for Windows.

    0 讨论(0)
  • 2020-12-03 16:49

    I was able to successfully serialize the graph using Boost serialization routines into a string, parse the string and remove the nodes I didn't need and de-serialize the modified string. For 200,000 total nodes in graph and 100,000 that needs to be deleted I was able to successfully finish the operation in less than 2 seconds.

    For my particular use-case each vertex has 3 64bit integers. When it needs to be deleted, I mark 2 of those integers as 0s. A valid vertex would never have a 0. When the point comes to clean up the graph - to delete the "deleted" vertices, I follow the above logic.

    In the code below removeDeletedNodes() does the string parsing and removing the vertices and mapping the edge numbers.

    enter image description here

    0 讨论(0)
  • 2020-12-03 16:59

    In your removal branch you re-tie() the iterators:

    boost::tie(vi, vi_end) = boost::vertices(m_graph);
    

    This will cause the loop to restart every time you restart the loop. This is exactly Schlemiel The Painter.

    I'll find out whether you can trust remove_vertex not triggering a reallocation. If so, it's easily fixed. Otherwise, you'd want an indexer-based loop instead of iterator-based. Or you might be able to work on the raw container (it's a private member, though, as I remember).

    Update Using vecS as the container for vertices is going to cause bad performance here:

    If the VertexList template parameter of the adjacency_list was vecS, then all vertex descriptors, edge descriptors, and iterators for the graph are invalidated by this operation. <...> If you need to make frequent use of the remove_vertex() function the listS selector is a much better choice for the VertexList template parameter.


    This small benchmark test.cpp compares:

    • with -DSTABLE_IT (listS)

      $ ./stable 
      Generated 100000 vertices and 5000 edges in 14954ms
      The graph has a cycle? false
      starting selective removal...
      Done in 0ms
      After: 99032 vertices and 4916 edges
      
    • without -DSTABLE_IT (vecS)

      $ ./unstable 
      Generated 100000 vertices and 5000 edges in 76ms
      The graph has a cycle? false
      starting selective removal...
      Done in 396ms
      After: 99032 vertices and 4916 edges
      
    • using filtered_graph (thanks @cv_and_he in the comments)

      Generated 100000 vertices and 5000 edges in 15ms
      The graph has a cycle? false
      starting selective removal...
      Done in 0ms
      After: 99032 vertices and 4916 edges
      Done in 13ms
      

    You can clearly see that removal is much faster for listS but generating is much slower.

    0 讨论(0)
提交回复
热议问题