Given the runtime data, how to know if sorting program uses bubble sort or insertion sort?

问题

I measured a sorting program / algorithm and based on the runtime data, I have narrowed it down to two sorting algorithms - bubble sort and insertion sort.

Is there a way to know for sure which one it is? Without knowing the code of course.

They both have the same time complexity and I'm out of ideas.

Time complexity data:

Sorted: O(n) Time taken for 1000 numbers = 0.0047s
Random unsorted: O(n^2) Time taken for 1000 numbers = 0.021s
Descending order: O(n^2) Time taken for 1000 numbers = 0.04s

Thanks in advance!

回答1:

Your 1000 elements for sort is too low

the measured times are too low to represent a valid measurement (as majority of the time might not be used by the sort itself but initialization of window, openning files etc ...).

you need times at least 100ms or more (1 sec is ideal).
if you have access to data that is being sorted

You can introduce data set that will be challenging for each type of sort (and from times infer used algo)... so for example bubble sort is slowest for sorted array in reverse order ... so pass sorted data ascending and descending and random and compare times. let call the times tasc,tdes,trnd and assuming ascending sort then if bubble sort is involved it should be:
```
tasc O(n) < trnd  < tdes O(n^2)
```
so:
```
tasc*n == tdes + margin_of error
```
so just test tdes/tasc is close to n ... with some margin for error ...

so you just need to create a sample data that will be hard for specific type of sort and not for the others ... and from the times detect if it is the case or not until you find algo used.

Here some data (all times are in [ms]) I tested on mine bubble sort and asc ordered data:
```
   n     tasc    tdesc    tasc*n
1000  0.00321  2.96147  3.205750
2000  0.00609 11.76799 12.181855
4000  0.01186 45.58834 47.445111
```
to be more clear if we have runtime for complexity O(n)
```
t(O(n)) = c*n
```
to convert to runtime with complexity O(n^2) (assuming the same constant time c):
```
t(O(n^2)) = c*n*n = t(O(n)) * n
```
This way you can compare times with different complexities you just need to convert all measured time into single common complexity...
if you can chose sorted data size

then as it was mentioned in comments you can infer the growth rate of the times with increasing n (doubling) from that you can estimate complexity and from that you can tell which algo was used.

So lets assume measured times from #2 then for O(n) the constant time c should be the same so for tasc (O(n)):
```
   n     tasc    c=tasc/n
1000  0.00321 0.000003210
2000  0.00609 0.000003045 
4000  0.01186 0.000002965 
```
and for tdesc (O(n^2)):
```
   n     tdesc        tdesc/n^2
1000   2.96147 0.00000296147000
2000  11.76799 0.00000294199750
4000  45.58834 0.00000284927125
```
as you can see the c is more or less the same for both times tasc,tdesc which means they comply their estimated complexities O(n),O(n^2)

However without knowing what the tested App does is hard to be sure as sorting might be preceded by processing ... for example data might be scanned to detect the form (sorted, random, almost sorted ...) which is doable in O(n) and with the result along with data size it might chose which sorting algo to use ... So your measurements might measure different routines invalidating results ...

[edit1] I had an insane idea of detecting the complexity automaticaly

Simply by testing if constant time constant is more or less the same between all meassured times versus their corresponding n... Here simple C++/VCL code:

//$$---- Form CPP ----
//---------------------------------------------------------------------------
#include <vcl.h>
#include <math.h>
#pragma hdrstop
#include "Unit1.h"
//---------------------------------------------------------------------------
#pragma package(smart_init)
#pragma resource "*.dfm"
TForm1 *Form1;
//---------------------------------------------------------------------------
double factorial[]= // n[-],t[ms]
    {
     11,0.008,
     12,0.012,
     13,0.013,
     14,0.014,
     15,0.016,
     16,0.014,
     17,0.015,
     18,0.017,
     19,0.019,
     20,0.016,
     21,0.017,
     22,0.019,
     23,0.021,
     24,0.023,
     25,0.025,
     26,0.027,
     27,0.029,
     28,0.032,
     29,0.034,
     30,0.037,
     31,0.039,
     32,0.034,
     33,0.037,
     34,0.039,
     35,0.041,
     36,0.039,
     37,0.041,
     38,0.044,
     39,0.046,
     40,0.041,
     41,0.044,
     42,0.046,
     43,0.049,
     44,0.048,
     45,0.050,
     46,0.054,
     47,0.056,
     48,0.056,
     49,0.060,
     50,0.063,
     51,0.066,
     52,0.065,
     53,0.069,
     54,0.072,
     55,0.076,
     56,0.077,
     57,0.162,
     58,0.095,
     59,0.093,
     60,0.089,
     61,0.093,
     62,0.098,
     63,0.096,
     64,0.090,
     65,0.100,
     66,0.104,
     67,0.111,
     68,0.100,
     69,0.121,
     70,0.109,
     71,0.119,
     72,0.104,
     73,0.124,
     74,0.113,
     75,0.118,
     76,0.118,
     77,0.123,
     78,0.129,
     79,0.133,
     80,0.121,
     81,0.119,
     82,0.131,
     83,0.150,
     84,0.141,
     85,0.148,
     86,0.154,
     87,0.163,
     88,0.211,
     89,0.151,
     90,0.157,
     91,0.166,
     92,0.161,
     93,0.169,
     94,0.173,
     95,0.188,
     96,0.181,
     97,0.187,
     98,0.194,
     99,0.201,
    100,0.185,
    101,0.191,
    102,0.202,
    103,0.207,
    104,0.242,
    105,0.210,
    106,0.215,
    107,0.221,
    108,0.217,
    109,0.226,
    110,0.232,
    111,0.240,
    112,0.213,
    113,0.231,
    114,0.240,
    115,0.252,
    116,0.248,
    117,0.598,
    118,0.259,
    119,0.261,
    120,0.254,
    121,0.263,
    122,0.270,
    123,0.281,
    124,0.290,
    125,0.322,
    126,0.303,
    127,0.313,
    128,0.307,
      0,0.000
    };
//---------------------------------------------------------------------------
double sort_asc[]=
    {
    1000,0.00321,
    2000,0.00609,
    4000,0.01186,
       0,0.000
    };
//---------------------------------------------------------------------------
double sort_desc[]=
    {
    1000, 2.96147,
    2000,11.76799,
    4000,45.58834,
       0,0.000
    };
//---------------------------------------------------------------------------
double sort_rand[]=
    {
    1000, 3.205750,
    2000,12.181855,
    4000,47.445111,
       0,0.000
    };
//---------------------------------------------------------------------------
double div(double a,double b){ return (fabs(b)>1e-10)?a/b:0.0; }
//---------------------------------------------------------------------------
AnsiString get_complexity(double *dat)  // expect dat[] = { n0,t(n0), n1,t(n1), ... , 0,0 }
    {
    AnsiString O="O(?)";
    int i,e;
    double t,n,c,c0,c1,a,dc=1e+10;
    #define testbeg for (e=1,i=0;dat[i]>0.5;){ n=dat[i]; i++; t=dat[i]; i++;
    #define testend(s) if ((c<=0.0)||(n<2.0)) continue; if (e){ e=0; c0=c; c1=c; } if (c0>c) c0=c; if (c1<c) c1=c; } a=fabs(1.0-div(c0,c1)); if (dc>=a){ dc=a; O=s; }


    testbeg;            c=div(t,n);                 testend("O(n)");
    testbeg;            c=div(t,n*n);               testend("O(n^2)");
    testbeg;            c=div(t,n*n*n);             testend("O(n^3)");
    testbeg;            c=div(t,n*n*n*n);           testend("O(n^4)");
    testbeg; a=log(n);  c=div(t,a);                 testend("O(log(n))");
    testbeg; a=log(n);  c=div(t,a*a);               testend("O(log^2(n))");
    testbeg; a=log(n);  c=div(t,a*a*a);             testend("O(log^3(n))");
    testbeg; a=log(n);  c=div(t,a*a*a*a);           testend("O(log^4(n))");
    testbeg; a=log(n);  c=div(t,n*a);               testend("O(n.log(n))");
    testbeg; a=log(n);  c=div(t,n*n*a);             testend("O(n^2.log(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*a);           testend("O(n^3.log(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*n*a);         testend("O(n^4.log(n))");
    testbeg; a=log(n);  c=div(t,n*a*a);             testend("O(n.log^2(n))");
    testbeg; a=log(n);  c=div(t,n*n*a*a);           testend("O(n^2.log^2(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*a*a);         testend("O(n^3.log^2(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*n*a*a);       testend("O(n^4.log^2(n))");
    testbeg; a=log(n);  c=div(t,n*a*a*a);           testend("O(n.log^3(n))");
    testbeg; a=log(n);  c=div(t,n*n*a*a*a);         testend("O(n^2.log^3(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*a*a*a);       testend("O(n^3.log^3(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*n*a*a*a);     testend("O(n^4.log^3(n))");
    testbeg; a=log(n);  c=div(t,n*a*a*a*a);         testend("O(n.log^4(n))");
    testbeg; a=log(n);  c=div(t,n*n*a*a*a*a);       testend("O(n^2.log^4(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*a*a*a*a);     testend("O(n^3.log^4(n))");
    testbeg; a=log(n);  c=div(t,n*n*n*n*a*a*a*a);   testend("O(n^4.log^4(n))");

    #undef testend
    #undef testbeg
    return O+AnsiString().sprintf(" error = %.6lf",dc);
    }
//---------------------------------------------------------------------------
__fastcall TForm1::TForm1(TComponent* Owner):TForm(Owner)
    {
    mm_log->Lines->Clear();
    mm_log->Lines->Add("factorial "+get_complexity(factorial));
    mm_log->Lines->Add("sort asc  "+get_complexity(sort_asc));
    mm_log->Lines->Add("sort desc "+get_complexity(sort_desc));
    mm_log->Lines->Add("sort rand "+get_complexity(sort_rand));
    }
//-------------------------------------------------------------------------

with relevant times measurements of mine fast exact bigint factorial where I simply used only the bigger times above 8ms, and also the sorting measurement from above which outputs this:

factorial O(n.log^2(n)) error = 0.665782
sort asc  O(n) error = 0.076324
sort desc O(n^2) error = 0.037886
sort rand O(n^2) error = 0.075000

The code just test few supported complexities and outputs the one that has lowest error (variation of c constant time between different n)...

Just ignore the VCL stuff and convert the AnsiString into any string or output you want ...

来源：https://stackoverflow.com/questions/64772061/given-the-runtime-data-how-to-know-if-sorting-program-uses-bubble-sort-or-inser

标签

algorithm

sorting