Tree iterator, can you optimize this any further?

本秂侑毒 提交于 2019-12-04 20:18:52

Why not a recursive solution?

void processTree (const BaseNodePtr &current, unsigned int & cnt )
{
  cnt++;

  if (current->hasChild())
    processTree(current->child());
  if (current->hasNext())
    processTree(current->next());
}

Since shared_ptr seems to be your bottleneck, why not improve it? Are you using threads? If not, then undefine the symbol BOOST_HAS_THREADS. The shared_ptr reference count is guarded by a mutex which is probably the cause of the slow performance.

Why not change your data structure to not use shared_ptr altogether? Manage the raw pointers yourself? Maybe use scoped_ptr instead?

For the ultimate speed up what you need to do is order the nodes in memory so that they are stored in a contiguous block in the order that you visit them.

e.g If you have a tree defined as follows.

        1
       / \
      2   3
     / \  /\
    4   5 6 7
   /\    /  /\
  8  9  10 11 12
 / \           \
13 14          15

Then the visit function as described will visit the nodes in the following order

1
 2
  4
   8
    13
    14
   9
  5
 3
  6
   10
  7
   11
   12
     15

Now if you order the nodes in memory as a contiguous block of 15 allocations and store the nodes in the order demonstrated above then you will generally visiting a node that has "spatial locality". This can improve your cache hits, depending upon the size of your node structure and thus make things run faster.

To create a quick iterative method of visiting all the nodes in a tree only once and with no recursion.

unsigned int g_StackDepth = 0;
BaseNodePtr* g_Stack[MAX_STACK_DEPTH];

void processTree (BaseNodePtr root, unsigned int & cnt )
{
    g_Stack[g_StackDepth++] = root;
    while( g_StackDepth > 0 )
    {
        BaseNodePtr curr = g_Stack[--g_StackDepth];
        cnt++;

        if ( curr->HasNext() )
        {
            g_Stack[g_StackDepth++] = curr->Next();
        }


        if ( curr->HasChild() )
        {
            g_Stack[g_StackDepth++] = curr->Child();
        }

    }
}

Combined with the above ordering you should get just about the best speed you CAN get, to my knowledge.

Obviously this has limitations as you have to know how big your stack can possibly grow in advance. Though you could get around this by using a std::vector instead. Using a std::vector however would eliminate all the advantages the iterative method above provides, however.

Hope that is some help :)

I HATE when answers dismiss the question with a "don't do that" but here I go...

Say there was a way to remove the down bool... will that really make any REAL difference in execution time? We're talking about a small handful of CPU operations and a few byte extra on the stack.

Focus on making the child() and parent() calls faster if you need speed. Otherwise you're wasting your time (IMOHO).

EDIT: Maybe walk the tree (w/ this "slow" code) ONCE and build an array of pointers into the tree in the desired order. Use this "index" later.

What I'm saying is I think you're approaching optimization from the wrong angle.

Here is how to have only one recursion call instead of two:

void processTree (const BaseNodePtr &current, unsigned int & cnt )
{
  for(bool gotNext = true; gotNext; current = current->next()) { 
    cnt++;
    if (current->hasChild())
      processTree(current->child());
    gotNext = current->hasNext();
  }
}

Create a "nextvisit" function, and keep calling that, to simplify the code; next to that, use const references iso value-semantics for the shared-pointers... this may save you valuable shared-ptr copies:

// define the order of visitation in here
BaseNodePtr& next( const BaseNodePtr& p ) {
    if( p->hasChild() ) return p->child();
    if( p->hasNext() ) return p->next();
    BaseNodePtr ancestor = p->parent();
    while( ancestor != 0 && !ancestor->hasNext() ) ancestor = ancestor->parent();
    return ancestor;
}

void processTree( const BaseNodePtr& p, unsigned int& cnt ) {
   while( p != NULL ) {
     ++cnt;
     p = next(p);
   }        
}

But for readability, clarity, maintainability, ... for god's sake, use recursion. Unless your stack isn't big enough.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!