I'm working on a massive number crunching project. I've been optimising everything since the start as I knew it would be important. Doing performance analysis my code spends almost 40% of it's life in one function - the binary tree iterator.
public ScTreeNode GetNodeForState(int rootIndex, float[] inputs)
{
0.2% ScTreeNode node = RootNodes[rootIndex].TreeNode;
24.6% while (node.BranchData != null)
{
0.2% BranchNodeData b = node.BranchData;
0.5% node = b.Child2;
12.8% if (inputs[b.SplitInputIndex] <= b.SplitValue)
0.8% node = b.Child1;
}
0.4% return node;
}
Do any C# optimisation experts have any tips for optimising this further? All comparisons are floats. I know that in theory it shouldn't matter, but I'm using fields rather than properties so ensure optimisation. A small saving here could shave days off the process.
Please no replies saying "These optimisations don't matter in the real world" - because in this instance they do. :-)
Edit: I've updated the code to what I've got now following the comments below, and added in the performance analysis output for each line of code. As you can see, the main killer is the null check - why? I tried using a boolean flag IsLeaf on the node instead of the null check, but it was a equal performance hit on that line.
The code for branch node object is as follows:
public sealed class BranchNodeData
{
/// <summary>
/// The index of the data item in the input array on which we need to split
/// </summary>
internal int SplitInputIndex = 0;
/// <summary>
/// The value that we should split on
/// </summary>
internal float SplitValue = 0;
/// <summary>
/// The nodes children
/// </summary>
internal ScTreeNode Child1;
internal ScTreeNode Child2;
}
Another Edit: Yet more thinking here... I was wondering why the line
BranchNodeData b = node.BranchData;
was registering 0.2% of execution and the null comparison line was registering 17.7%. I'm guessing this is a branch prediction fail? While that comparison being hit multiple times, and almost always returning true, it makes it very hard for CPU to predict when it's going to return false. I'm not very clued up on the low level workings of a CPU, but is this likely be the case?
Just some code rewrite. It might help because it avoids at least two jumps.
public ScTreeNode GetNodeForState(int rootIndex, float[] inputs)
{
ScTreeNode node = RootNodes[rootIndex].TreeNode;
while (node.BranchData != null)
{
BranchNodeData b = node.BranchData;
node = b.Child2;
if (inputs[b.SplitInputIndex] <= b.SplitValue))
node = b.Child1;
}
return node;
}
BranchNodeData looks like a reference type. its only 0.2% of your runtime because its just making a pointer to the data that already exists, not actually copying or assigning anything.
You're probably getting such a hit on the null check because the CLR is having to do a cast in order to check the sealed class you've pasted in. Checking for nullity there isn't necessarily what you're after. There are a ton of ways to modify that class to give you a boolean to check against that wouldn't require as much computing power. I'd honestly go the route of having that be something that your ScTreeNode class can provide.
Given the points made in the other answer about caching, but not in relation to the null check, try ordering the references to the BranchNodeData fields so that the first reference allows all of the following fields to be loaded into the cache.
That is, I assume the Jitter, or the CPU, is not smart enough to load "backwards" to cache SplitInputIndex, SplitValue and Child1 when Child2 is referenced first in the current code.
So either change the order of the fields in the BranchNodeData class, or change the set; if ... overwrite; to an if ... else.
来源:https://stackoverflow.com/questions/16416084/micro-optimisations-iterating-through-a-tree-in-c-sharp