问题
A few years back, I got an assignment at school, where I had to parallelize a Raytracer.
It was an easy assignment, and I really enjoyed working on it.
Today, I felt like profiling the raytracer, to see if I could get it to run any faster (without completely overhauling the code). During the profiling, I noticed something interesting:
// Sphere.Intersect
public bool Intersect(Ray ray, Intersection hit)
{
double a = ray.Dir.x * ray.Dir.x +
ray.Dir.y * ray.Dir.y +
ray.Dir.z * ray.Dir.z;
double b = 2 * (ray.Dir.x * (ray.Pos.x - Center.x) +
ray.Dir.y * (ray.Pos.y - Center.y) +
ray.Dir.z * (ray.Pos.z - Center.z));
double c = (ray.Pos.x - Center.x) * (ray.Pos.x - Center.x) +
(ray.Pos.y - Center.y) * (ray.Pos.y - Center.y) +
(ray.Pos.z - Center.z) * (ray.Pos.z - Center.z) - Radius * Radius;
// more stuff here
}
According to the profiler, 25% of the CPU time was spent on get_Dir
and get_Pos
, which is why, I decided to optimize the code in the following way:
// Sphere.Intersect
public bool Intersect(Ray ray, Intersection hit)
{
Vector3d dir = ray.Dir, pos = ray.Pos;
double xDir = dir.x, yDir = dir.y, zDir = dir.z,
xPos = pos.x, yPos = pos.y, zPos = pos.z,
xCen = Center.x, yCen = Center.y, zCen = Center.z;
double a = xDir * xDir +
yDir * yDir +
zDir * zDir;
double b = 2 * (xDir * (xPos - xCen) +
yDir * (yPos - yCen) +
zDir * (zPos - zCen));
double c = (xPos - xCen) * (xPos - xCen) +
(yPos - yCen) * (yPos - yCen) +
(zPos - zCen) * (zPos - zCen) - Radius * Radius;
// more stuff here
}
With astonishing results.
In the original code, running the raytracer with its default arguments (create a 1024x1024 image with only direct lightning and without AA) would take ~88 seconds.
In the modified code, the same would take a little less than 60 seconds.
I achieved a speedup of ~1.5 with only this little modification to the code.
At first, I thought the getter for Ray.Dir
and Ray.Pos
were doing some stuff behind the scene, that would slow the program down.
Here are the getters for both:
public Vector3d Pos
{
get { return _pos; }
}
public Vector3d Dir
{
get { return _dir; }
}
So, both return a Vector3D, and that's it.
I really wonder, how calling the getter would take that much longer, than accessing the variable directly.
Is it because of the CPU caching variables? Or maybe the overhead from calling these methods repeatedly added up? Or maybe the JIT handling the latter case better than the former? Or maybe there's something else I'm not seeing?
Any insights would be greatly appreciated.
Edit:
As @MatthewWatson suggested, I used a StopWatch
to time release builds outside of the debugger. In order to get rid of noise, I ran the tests multiple times. As a result, the former code takes ~21 seconds (between 20.7 and 20.9) to finish, whereas the latter only ~19 seconds (between 19 and 19.2).
The difference has become negligible, but it is still there.
回答1:
Introduction
I'd be willing to bet that the original code is so much slower because of a quirk in C# involving properties of type structs. It's not exactly intuitive, but this type of property is inherently slow. Why? Because structs are not passed by reference. So in order to access ray.Dir.x
, you have to
- Load local variable
ray
. - Call
get_Dir
and store the result in a temporary variable. This involves copying the entire struct, even though only the field 'x' is ever used. - Access field
x
from the temporary copy.
Looking at the original code, the get accessors are called 18 times. This is a huge waste, because it means that the entire struct is copied 18 times overall. In your optimized code, there are only two copies - Dir
and Pos
are both called only once; further access to the values only consist of the third step from above:
- Access field
x
from the temporary copy.
To sum it up, structs and properties do not go together.
Why does C# behave this way with struct properties?
It has something to do with the fact that in C#, structs are value types. You are passing around the value itself, rather than a pointer to the value.
Why doesn't the compiler recognize that the get accessor is simply returning a field, and bypass the property alltogether?
In debug mode, optimizations like this are skipped to provide for a better debegging experience. Even in release mode, you'll find that most jitters don't often do this. I don't know exactly why, but I believe it is because the field is not always word-aligned. Modern CPUs have odd performance requirements. :-)
来源:https://stackoverflow.com/questions/16779909/why-is-a-simple-get-statement-so-slow