What can I do in Java code to optimize for CPU caching?

后端 未结 5 1899
栀梦
栀梦 2020-12-12 18:53

When writing a Java program, do I have influence on how the CPU will utilize its cache to store my data? For example, if I have an array that is accessed a lot, does it help

相关标签:
5条回答
  • 2020-12-12 19:29

    The key to good performance with Java is to write idiomatic code, rather than trying to outwit the JIT compiler. If you write your code to try to influence it to do things in a certain way at the native instruction level, you are more likely to shoot yourself in the foot.

    That isn't to say that common principles like locality of reference don't matter. They do, but I would consider the use of arrays and such to be performance-aware, idiomatic code, but not "tricky."

    HotSpot and other optimizing runtimes are extremely clever about how they optimize code for specific processors. (For an example, check out this discussion.) If I were an expert machine language programmer, I'd write machine language, not Java. And if I'm not, it would be unwise to think that I could do a better job of optimizing my code than the experts.

    Also, even if you do know the best way to implement something for a particular CPU, the beauty of Java is write-once-run-anywhere. Clever tricks to "optimize" Java code tend to make optimization opportunities harder for the JIT to recognize. Straight-forward code that adheres to common idioms is easier for an optimizer to recognize. So even when you get the best Java code for your testbed, that code might perform horribly on a different architecture, or at best, fail to take advantages of enhancements in future JITs.

    If you want good performance, keep it simple. Teams of really smart people are working to make it fast.

    0 讨论(0)
  • 2020-12-12 19:40

    To the best of my knowledge: No. You pretty much have to be writing in machine code to get that level of optimization. With assembly you're a step away because you no longer control where things are stored. With a compiler you're two steps away because you don't even control the details of the generated code. With Java you're three steps away because there's a JVM interpreting your code on the fly.

    I don't know of any constructs in Java that let you control things on that level of detail. In theory you could indirectly influence it by how you organize your program and data, but you're so far away that I don't see how you could do it reliably, or even know whether or not it was happening.

    0 讨论(0)
  • 2020-12-12 19:42

    So far the advice is pretty strong, in general it's best not to try and outsmart the JIT. But as you say some knowledge about the details is useful sometimes.

    Regarding memory layout for objects, Sun's Jvm (now Oracle's) lays objects into memory by type (i.e. doubles and longs first, then ints and floats, then shorts and chars, after that bytes and booleans and finally object references). You can get more details here..

    Local variables are usually kept in the stack (that is references and primitive types).

    As Nick mentions, the best way to ensure the memory layout in Java is by using primitive arrays. That way you can make sure that data is contiguous in memory. Be careful about array sizes though, GCs have trouble with large arrays. It also has the downside that you have to do some memory management yourself.

    On the upside, you can use a Flyweight pattern to get Object-like usability while keeping fast performance.

    If you need the extra oomph in performance, generating your own bytecode on the fly helps with some problems, as long as the generated code is executed enough times and your VM's native code cache doesn't get full (which disables the JIT for all practical purposes).

    0 讨论(0)
  • 2020-12-12 19:43

    If you are down to where an improvement of a few percent makes a difference, use C where you'll get an improvement of 50-100%!

    If you think that the ease of use of Java makes it a better language to use, then don't screw it up with questionable optimizations.

    The good news is that Java will do a lot of stuff beneath the covers to improve your code at runtime, but it almost certainly won't do the kind of optimizations you're talking about.

    If you decide to go with Java, just write your code as clearly as you can, don't take minor optimizations into account at all. (Major ones like using the right collections for the right job, not allocating/freeing objects inside a loop, etc. are still worth while)

    0 讨论(0)
  • 2020-12-12 19:46

    If the data you're crunching is primarily or wholly made up of primitives (eg. in numeric problems), I would advise the following.

    Allocate a flat structure of fixed size arrays-of-primitives at initialisation-time, and make sure the data therein is periodically compacted/defragmented (0->n where n is the smallest max index possible given your element count), to be iterated over using a for-loop. This is the only way to guarantee contiguous allocation in Java, and compaction further serves to improves locality of reference. Compaction is beneficial, as it reduces the need to iterate over unused elements, reducing the number of conditionals: As the for loop iterates, the termination occurs earlier, and less iteration = less movement through the heap = fewer chances for a cache miss. While compaction creates an overhead in and of itself, this may be done only periodically (with respect to your primary areas of processing) if you so choose.

    Even better, you can interleave values in these pre-allocated arrays. For instance, if you are representing spatial transforms for many thousands of entities in 2D space, and are processing the equations of motion for each such, you might have a tight loop like

    int axIdx, ayIdx, vxIdx, vyIdx, xIdx, yIdx;
    
    //Acceleration, velocity, and displacement for each
    //of x and y totals 6 elements per entity.
    for (axIdx = 0; axIdx < array.length; axIdx += 6) 
    {
        ayIdx = axIdx+1;
        vxIdx = axIdx+2;
        vyIdx = axIdx+3;
        xIdx = axIdx+4;
        yIdx = axIdx+5;
    
        //velocity1 = velocity0 + acceleration 
        array[vxIdx] += array[axIdx];
        array[vyIdx] += array[ayIdx];
    
        //displacement1 = displacement0 + velocity
        array[xIdx] += array[vxIdx];
        array[yIdx] += array[vxIdx];
    }
    

    This example ignores such issues as rendering of those entities using their associated (x,y)... rendering always requires non-primitives (thus, references/pointers). If you do need such object instances, then you can no longer guarantee locality of reference, and will likely be jumping around all over the heap. So if you can split your code into sections where you have primitive-intensive processing as shown above, then this approach will help you a lot. For games at least, AI, dynamic terrain, and physics can be some of the most processor-intensives aspect, and are all numeric, so this approach can be very beneficial.

    0 讨论(0)
提交回复
热议问题