问题
I have a program I ported from C to Java. Both apps use quicksort to order some partitioned data (genomic coordinates).
The Java version runs fast, but I'd like to get it closer to the C version. I am using the Sun JDK v6u14.
Obviously I can't get parity with the C application, but I'd like to learn what I can do to eke out as much performance as reasonably possible (within the limits of the environment).
What sorts of things can I do to test performance of different parts of the application, memory usage, etc.? What would I do, specifically?
Also, what tricks can I implement (in general) to change the properties and organization of my classes and variables, reducing memory usage and improving speed?
EDIT : I am using Eclipse and would obviously prefer free options for any third-party tools. Thanks!
回答1:
do not try to outsmart the jvm.
in particular:
don't try to avoid object creation for the sake of performance
use immutable objects where applicable.
use the scope of your objects correctly, so that the GC can do its job.
use primitives where you mean primitives (e.g. non-nullable int compared to nullable Integer)
use the built-in algorithms and data structures
when handing concurrency use java.util.concurrent package.
correctness over performance. first get it right, then measure, then measure with a profiler then optimize.
回答2:
Obviously, profile profile profile. For Eclipse there's TPTP. Here's an article on the TPTP plugin for Eclipse. Netbeans has its own profiler. jvisualvm is nice as a standalone tool. (The entire dev.java.net server seems to be down at the moment, but it is very much an active project.)
The first thing to do is use the library sorting routine, Collections.sort; this will require your data objects to be Comparable. This might be fast enough and will definitely provide a good baseline.
General tips:
- Avoid locks you don't need (your JVM may have already optimized these away)
- Use StringBuilder (not StringBuffer because of that lock thing I just mentioned) instead of concatenating
Stringobjects - Make anything you can
final; if possible, make your classes completely immutable - If you aren't changing the value of a variable in a loop, try hoisting it out and see if it makes a difference (the JVM may have already done this for you)
- Try to work on an ArrayList (or even an array) so the memory you're accessing is contiguous instead of potentially fragmented the way it might be with a LinkedList
- Quicksort can be parallelized; consider doing that (see quicksort parallelization)
- Reduce the visibility and live time of your data as much as possible (but don't contort your algorithm to do it unless profiling shows it is a big win)
回答3:
Use a profiler:
- visualvm ( free, limited )
- jprofiler ( commercial )
- yourkit java profiler ( commercial )
- hprof ( free, limited, console only )
Use the latest version of JVM from your provider. Incidentally Sun's Java 6 update 14 does bring performance improvements.
Measure your GC throughput and pick the best garbage collector for your workload.
回答4:
Don't optimize prematurely.
Measure performance, then optimize.
Use final variables whenever possible. It will not only allow JVM to optimize more, but also make your code easier to read and maintain.
If you make your objects immutable, you don't have to clone them.
Optimize by changing the algorithm first, then by changing the implementation.
Sometimes you need to resort to old-style techniques, like loop unrolling or caching precalculated values. Remember about them, even if they don't look nice, they can be useful.
回答5:
Also try tweaking the runtime arguments of the VM - the latest release of the VM for example includes the following flag which can improve performance in certain scenarios.
-XX:+DoEscapeAnalysis
回答6:
jvisualvm ships with JDK 6 now - that's the reason the link cited above doesn't work. Just type "jvisualvm <pid>", where <pid> is the ID of the process you want to track. You'll get to see how the heap is being used, but you won't see what's filling it up.
If it's a long-running process, you can turn on the -server option when you run. There are a lot of tuning options available to you; that's just one.
回答7:
First caveat - make sure you have done appropriate profiling or benchmarking before embarking on any optimisation work. The results will often enlighten you, and nearly always save you a lot of wasted effort in optimising something that doesn't matter.
Assuming that you do need it, then you can get performance comparable to C in Java, but it takes some effort. You need to know where the JVM is doing "extra work" and avoid these.
In particular:
- Avoid unnecessary object creation. While the JVM heap and GC is extremely fast and efficient (probably the best in the world, and almost certainly better than anything you could roll yourself in C), it is still heap allocation and that will be beaten by avoiding the heap in the first place (stack or register allocation)
- Avoid boxed primitives. You want to be using
doubleand notDouble. - Use primitive arrays for any big chunks of data. Java primitive arrays are basically as fast as C/C++ arrays (they do have an additional bounds check but that is usually insignificant)
- Avoid anything synchronized - Java threading is pretty decent but it is still overhead that you may not need. Give each thread it's own data to work on.
- Exploit concurrency - Java's concurrency support is very good. You might as well use all your cores! This is a big topic but there are plenty of good books / tutorials available.
- Use specialised collection classes for certain types of data if you have some very specific requirements, e.g. supporting some specialised sorting/search algorithms. You may need to roll your own, but there are also some good libraries with high performance collection classes available that may fit your needs - see e.g. Javoltion
- Avoid big class heirarchies - this is a design smell in performance code. Every layer of abstraction is costing you overhead. Very fast Java code will often end up looking rather like C....
- Use static methods - the JIT can optimise these extremely well. It will usually inline them.
- Use final concrete classes - again, the JIT can optimise these very well by avoiding virtual function calls.
- Generate your own bytecode - if all else fails, this can be a viable option if you want the absolute maximum performance out of the JVM. Particularly useful if you need to compile your own DSL. Use something like ASM.
回答8:
If your algorithm is CPU-heavy, you may want to consider taking advantage of parallelisation. You may be able to sort in multiple threads and merge the results back later.
This is however not a decision to be taken lightly, as writing concurrent code is hard.
回答9:
Can't you use the sort functions that are included in the Java library?
You could at least look at the speed difference between the two sorting functions.
回答10:
Methodolically, you have to profile the application and then get an idea of what components of your program are time and memory-intensive: then take a closer look to that components, in order to improve their performances (see Amdahl's law).
From a pure technological POV, you can use some java-to-nativecode compilers, like Excelsior's jet, but I've to note that recent JVM are really fast, so the VM should not impact in a significative manner.
回答11:
Is your sorting code executing only once, e.g. in a commandline utility that just sorts, or multiple times, e.g. a webapp that sorts in response to some user input?
Chances are that performance would increase significantly after the code has been executed a few times because the HotSpot VM may optimize aggressively if it decides your code is a hotspot.
This is a big advantage compared to C/C++.
The VM, at runtime, optimizes code that is used often, and it does that quite well. Performance can actually rise beyond that of C/C++ because of this. Really. ;)
Your custom Comparator could be a place for optimization, though.
Try to check inexpensive stuff first (e.g. int comparison) before more expensive stuff (e.g. String comparison). I'm not sure if those tips apply because I don't know your Comparator.
Use either Collections.sort(list, comparator) or Arrays.sort(array, comparator). The array variant will be a bit faster, see the respective documentation.
As Andreas said before: don't try to outsmart the VM.
回答12:
Perhaps there are other routes to performance enhancement other than micro-optimization of code. How about a different algorithm to achieve what you wanted your program to do? May be a different data structure?
Or trade some disk/ram space for speed, or if you can give up some time upfront during the loading of your program, you can precompute lookup tables instead of doing calculations - that way, the processing is fast. I.e., make some trade-offs of other resources available.
回答13:
Here's what I would do, in any language. If samples show that your sort-comparison routine is active a large percentage of the time, you might find a way to simplify it. But maybe the time is going elsewhere. Diagnose first, to see what's broken, before you fix anything. Chances are, if you fix the biggest thing, then something else will be the biggest thing, and so on, until you've really gotten a pretty good speedup.
回答14:
Profile and tune your java program and host machine. Most code follows 80/20 rule. That is 20% of code 80% of time, so find that 20% and make it as fast as possible. For example, the article Tuning Java Servers (http://www.infoq.com/articles/Tuning-Java-Servers) provides a description of drill down from command line and then isolate the problem using tools like Java Flight recorder, Eclipse Memory Analyser, and JProfiler.
来源:https://stackoverflow.com/questions/938683/java-performance-tips