Test case for Insertion Sort, MergeSort and Quick Sort

无人久伴 提交于 2019-12-04 07:14:33
devoured elysium

There seem to be some systematic errors on the approach you're currently undertaking. I'll state some of the most obvious experimental issues you're facing, even if they might not directly be the culprits of your experimental results.

JVM Compilation

As I've stated previously in a comment, the JVM will by default run your code in interpreted mode. That means each bytecode instruction found in your methods will be interpreted on-the-spot, and then ran.

The advantage of this approach is that it allows your application to have faster startup times than would a Java program that'd be compiled to native code on the startup of each of your runs.

The downside is that while there's no startup performance hit, you'll get a slower performing program than you'd get otherwise.

To ameliorate both concerns, a tradeoff was taken by the JVM team. Initially your program will be interpreted, but the JVM will gather information about which methods (or parts of methods) are being used intensively, and will compile down only those ones that seem to be used a lot. When compiling, you'll get a small performance hit, but then the code will be way faster than was before.

You'll have to take this fact into consideration when doing measurements.

The standard approach is to "warm up the JVM", that is, to run your algorithms for a bit to make sure the JVM does all the compilations it needs to do. It is important to have the code that warms the JVM be the same as the one you'll want to benchmark, otherwise there may be some compilation while you're benchmarking your code.

Measuring time

When measuring time, you should use System.nanoTime() instead of System.currentTimeMillis. I won't go over the details, those can be accessed here.

You should also be careful. The following blocks of code may seem equivalent at first, but will yield different results most of the times:

totalDuration = 0;
for (i = 0; i < 1000; ++i)
    startMeasure = now();
    algorithm();
    endMeasure = now();
    duration = endMeasure - startMeasure;
    totalDuration += duration;
}

//...

TRIALS_COUNT = 1000;
startMeasure = now();
for (i = 0; i < TRIALS_COUNT; ++i)
    algorithm();
}
endMeasure = now();
 duration = endMeasure - startMeasure / TRIALS_COUNT;

Why? Because especially for faster algorithm() implementations, the faster they are, the harder it is to get accurate results.

Large input values

The asymptotic notation is useful to understand how different algorithms will escalate for big values of n. Applying them for small input values is often nonsensical, because at that magnitude generally what you'd want is to count the precise number of operations and their associated costs (something akin to what Jakub did). Big O notation only makes sense for big Os most of the time. Big O will tell you how the algorithm works for excruciating input value sizes, not how it will behave for small numbers. The canonical example is for instance QuickSort, which for big arrays will be king, but that will be generally slower for arrays of size 4 or 5 than Selection or Insertion Sort. Your input sizes seem to be big enough, though.

On a final note

and as previously stated by Chang, the mathematical exercise done by Jakub is completely wrong and should not be taken into consideration.

Do the computations of the complexity by yourself. I assume 10000 samples for the following calculations:

Insertion sort: O(n^2), 10 000*10 000 = 100 000 000.

Merge sort: O(nlogn), 10 000*log10 000 = 140 000.

Merge with insertion (15): 15 is between 9 (arrays of sizes 20) and 10 (arrays of sizes 10) 2^10 insertion sorts (of size 10), then 2^10 * 10 000 merges: 1 024 * 10*10 (insertions) + 1 024 * 10 000 (merges) = 10 342 400

Merge with insertion (30): 30 is between 8 (arrays of sizes 40) and 9 (arrays of sizes 20) 2^9 insertion sorts (of size 20), then 2^9 * 10 000 merges: 512 * 20*20 (insertions) + 512 * 10 000 (merges) = 5 324 800

Merge with insertion (45): 45 is between 7 (arrays of sizes 80) and 8 (arrays of sizes 40) 2^8 insertion sorts (of size 40), then 2^8 * 10 000 merges: 256 * 40*40 (insertions) + * 10 000 (merges) = 2 969 600

Quicksort: while worst-case quicksort takes O(n^2), the worst case must be specially crafted to hit that limit. Mostly, using radomly generated algorithm, on average it is O(nlogn): 10 000*log10 000 = 140 000.

Measuring sorting algorithm performace can became quite a pain because you need to measure efficiently, on as wide range of input data as possible.

The figures you see on insertion sort can be largerly biased by the input numbers. If you are using only 0s and 1s in the array, and the array is randomly generated, then you actually have much easier problem for the algorithm to solve. For the given case, on average half of the array is already sorted, and you don't need to compare 0s and 1s with each other. The problem is reduced to transporting all 0s to the left, which on average takes only (log(n/2))!+n time. For 10 000, the actual time is 5 000!+10 000 = 133 888.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!