When I run my timing test program in Java Hotspot client, I get consistent behavior. However, when I run it in Hotspot server, I get unexpected result. Essentially, the cost
In short, the JIT can optimises a single method call, and two method calls, in ways it cannot with more multi-polymorphic calls. The number of possible methods which might be called on any given line is what matters and the JIT builds up this picture over time. When a method is inlined further optimisations are possible, but in your case the line in question increases the number of possible method calls from test1 over the life of the run and so it gets slower.
The way I get around this is to duplicate the short test code so each class is tested equally (assuming this is realistic) If you program will be multi-polymorphic when it is running, this is what you should test to be realistic as you can see it can change the results.
When you run the method from a fresh loop you see the benefit of only calling one method from that line of code.
Here is a table of different costs you might see depending on the number of possible methods any individual line can call. http://vanillajava.blogspot.co.uk/2012/12/performance-of-inlined-virtual-method.html
Polymorphism is not designed to improve performance and for me it is entirely reasonable that as the complexity of the polymorphism increases it should be slower.
BTW making methods final doesn't improve the performance any more. The JIT works out if you have called a sub-class on a line by line basis (as discussed)
EDIT As you can see the client JVM doesn't optimise the code as much as it is designed fr relatively light eight startup times. This means the client JVM is more consistent, but consistently slower. If you want the best performance you need to consider a number of optimisation strategies which leads to multiple possible outcomes depending on whether the optimisation is applied or not.