问题
I measured a sorting program / algorithm and based on the runtime data, I have narrowed it down to two sorting algorithms - bubble sort and insertion sort.
Is there a way to know for sure which one it is? Without knowing the code of course.
They both have the same time complexity and I'm out of ideas.
Time complexity data:
- Sorted: O(n) Time taken for 1000 numbers = 0.0047s
- Random unsorted: O(n^2) Time taken for 1000 numbers = 0.021s
- Descending order: O(n^2) Time taken for 1000 numbers = 0.04s
Thanks in advance!
回答1:
Your 1000 elements for sort is too low
the measured times are too low to represent a valid measurement (as majority of the time might not be used by the sort itself but initialization of window, openning files etc ...).
you need times at least 100ms or more (1 sec is ideal).
if you have access to data that is being sorted
You can introduce data set that will be challenging for each type of sort (and from times infer used algo)... so for example bubble sort is slowest for sorted array in reverse order ... so pass sorted data ascending and descending and random and compare times. let call the times
tasc,tdes,trnd
and assuming ascending sort then if bubble sort is involved it should be:tasc O(n) < trnd < tdes O(n^2)
so:
tasc*n == tdes + margin_of error
so just test
tdes/tasc
is close ton
... with some margin for error ...so you just need to create a sample data that will be hard for specific type of sort and not for the others ... and from the times detect if it is the case or not until you find algo used.
Here some data (all times are in
[ms]
) I tested on mine bubble sort and asc ordered data:n tasc tdesc tasc*n 1000 0.00321 2.96147 3.205750 2000 0.00609 11.76799 12.181855 4000 0.01186 45.58834 47.445111
to be more clear if we have runtime for complexity
O(n)
t(O(n)) = c*n
to convert to runtime with complexity
O(n^2)
(assuming the same constant timec
):t(O(n^2)) = c*n*n = t(O(n)) * n
This way you can compare times with different complexities you just need to convert all measured time into single common complexity...
if you can chose sorted data size
then as it was mentioned in comments you can infer the growth rate of the times with increasing
n
(doubling) from that you can estimate complexity and from that you can tell which algo was used.So lets assume measured times from #2 then for
O(n)
the constant timec
should be the same so for tasc (O(n)
):n tasc c=tasc/n 1000 0.00321 0.000003210 2000 0.00609 0.000003045 4000 0.01186 0.000002965
and for tdesc (
O(n^2)
):n tdesc tdesc/n^2 1000 2.96147 0.00000296147000 2000 11.76799 0.00000294199750 4000 45.58834 0.00000284927125
as you can see the
c
is more or less the same for both timestasc,tdesc
which means they comply their estimated complexitiesO(n),O(n^2)
However without knowing what the tested App does is hard to be sure as sorting might be preceded by processing ... for example data might be scanned to detect the form (sorted, random, almost sorted ...) which is doable in O(n)
and with the result along with data size it might chose which sorting algo to use ... So your measurements might measure different routines invalidating results ...
[edit1] I had an insane idea of detecting the complexity automaticaly
Simply by testing if constant time constant is more or less the same between all meassured times versus their corresponding n
... Here simple C++/VCL code:
//$$---- Form CPP ----
//---------------------------------------------------------------------------
#include <vcl.h>
#include <math.h>
#pragma hdrstop
#include "Unit1.h"
//---------------------------------------------------------------------------
#pragma package(smart_init)
#pragma resource "*.dfm"
TForm1 *Form1;
//---------------------------------------------------------------------------
double factorial[]= // n[-],t[ms]
{
11,0.008,
12,0.012,
13,0.013,
14,0.014,
15,0.016,
16,0.014,
17,0.015,
18,0.017,
19,0.019,
20,0.016,
21,0.017,
22,0.019,
23,0.021,
24,0.023,
25,0.025,
26,0.027,
27,0.029,
28,0.032,
29,0.034,
30,0.037,
31,0.039,
32,0.034,
33,0.037,
34,0.039,
35,0.041,
36,0.039,
37,0.041,
38,0.044,
39,0.046,
40,0.041,
41,0.044,
42,0.046,
43,0.049,
44,0.048,
45,0.050,
46,0.054,
47,0.056,
48,0.056,
49,0.060,
50,0.063,
51,0.066,
52,0.065,
53,0.069,
54,0.072,
55,0.076,
56,0.077,
57,0.162,
58,0.095,
59,0.093,
60,0.089,
61,0.093,
62,0.098,
63,0.096,
64,0.090,
65,0.100,
66,0.104,
67,0.111,
68,0.100,
69,0.121,
70,0.109,
71,0.119,
72,0.104,
73,0.124,
74,0.113,
75,0.118,
76,0.118,
77,0.123,
78,0.129,
79,0.133,
80,0.121,
81,0.119,
82,0.131,
83,0.150,
84,0.141,
85,0.148,
86,0.154,
87,0.163,
88,0.211,
89,0.151,
90,0.157,
91,0.166,
92,0.161,
93,0.169,
94,0.173,
95,0.188,
96,0.181,
97,0.187,
98,0.194,
99,0.201,
100,0.185,
101,0.191,
102,0.202,
103,0.207,
104,0.242,
105,0.210,
106,0.215,
107,0.221,
108,0.217,
109,0.226,
110,0.232,
111,0.240,
112,0.213,
113,0.231,
114,0.240,
115,0.252,
116,0.248,
117,0.598,
118,0.259,
119,0.261,
120,0.254,
121,0.263,
122,0.270,
123,0.281,
124,0.290,
125,0.322,
126,0.303,
127,0.313,
128,0.307,
0,0.000
};
//---------------------------------------------------------------------------
double sort_asc[]=
{
1000,0.00321,
2000,0.00609,
4000,0.01186,
0,0.000
};
//---------------------------------------------------------------------------
double sort_desc[]=
{
1000, 2.96147,
2000,11.76799,
4000,45.58834,
0,0.000
};
//---------------------------------------------------------------------------
double sort_rand[]=
{
1000, 3.205750,
2000,12.181855,
4000,47.445111,
0,0.000
};
//---------------------------------------------------------------------------
double div(double a,double b){ return (fabs(b)>1e-10)?a/b:0.0; }
//---------------------------------------------------------------------------
AnsiString get_complexity(double *dat) // expect dat[] = { n0,t(n0), n1,t(n1), ... , 0,0 }
{
AnsiString O="O(?)";
int i,e;
double t,n,c,c0,c1,a,dc=1e+10;
#define testbeg for (e=1,i=0;dat[i]>0.5;){ n=dat[i]; i++; t=dat[i]; i++;
#define testend(s) if ((c<=0.0)||(n<2.0)) continue; if (e){ e=0; c0=c; c1=c; } if (c0>c) c0=c; if (c1<c) c1=c; } a=fabs(1.0-div(c0,c1)); if (dc>=a){ dc=a; O=s; }
testbeg; c=div(t,n); testend("O(n)");
testbeg; c=div(t,n*n); testend("O(n^2)");
testbeg; c=div(t,n*n*n); testend("O(n^3)");
testbeg; c=div(t,n*n*n*n); testend("O(n^4)");
testbeg; a=log(n); c=div(t,a); testend("O(log(n))");
testbeg; a=log(n); c=div(t,a*a); testend("O(log^2(n))");
testbeg; a=log(n); c=div(t,a*a*a); testend("O(log^3(n))");
testbeg; a=log(n); c=div(t,a*a*a*a); testend("O(log^4(n))");
testbeg; a=log(n); c=div(t,n*a); testend("O(n.log(n))");
testbeg; a=log(n); c=div(t,n*n*a); testend("O(n^2.log(n))");
testbeg; a=log(n); c=div(t,n*n*n*a); testend("O(n^3.log(n))");
testbeg; a=log(n); c=div(t,n*n*n*n*a); testend("O(n^4.log(n))");
testbeg; a=log(n); c=div(t,n*a*a); testend("O(n.log^2(n))");
testbeg; a=log(n); c=div(t,n*n*a*a); testend("O(n^2.log^2(n))");
testbeg; a=log(n); c=div(t,n*n*n*a*a); testend("O(n^3.log^2(n))");
testbeg; a=log(n); c=div(t,n*n*n*n*a*a); testend("O(n^4.log^2(n))");
testbeg; a=log(n); c=div(t,n*a*a*a); testend("O(n.log^3(n))");
testbeg; a=log(n); c=div(t,n*n*a*a*a); testend("O(n^2.log^3(n))");
testbeg; a=log(n); c=div(t,n*n*n*a*a*a); testend("O(n^3.log^3(n))");
testbeg; a=log(n); c=div(t,n*n*n*n*a*a*a); testend("O(n^4.log^3(n))");
testbeg; a=log(n); c=div(t,n*a*a*a*a); testend("O(n.log^4(n))");
testbeg; a=log(n); c=div(t,n*n*a*a*a*a); testend("O(n^2.log^4(n))");
testbeg; a=log(n); c=div(t,n*n*n*a*a*a*a); testend("O(n^3.log^4(n))");
testbeg; a=log(n); c=div(t,n*n*n*n*a*a*a*a); testend("O(n^4.log^4(n))");
#undef testend
#undef testbeg
return O+AnsiString().sprintf(" error = %.6lf",dc);
}
//---------------------------------------------------------------------------
__fastcall TForm1::TForm1(TComponent* Owner):TForm(Owner)
{
mm_log->Lines->Clear();
mm_log->Lines->Add("factorial "+get_complexity(factorial));
mm_log->Lines->Add("sort asc "+get_complexity(sort_asc));
mm_log->Lines->Add("sort desc "+get_complexity(sort_desc));
mm_log->Lines->Add("sort rand "+get_complexity(sort_rand));
}
//-------------------------------------------------------------------------
with relevant times measurements of mine fast exact bigint factorial where I simply used only the bigger times above 8ms, and also the sorting measurement from above which outputs this:
factorial O(n.log^2(n)) error = 0.665782
sort asc O(n) error = 0.076324
sort desc O(n^2) error = 0.037886
sort rand O(n^2) error = 0.075000
The code just test few supported complexities and outputs the one that has lowest error (variation of c
constant time between different n
)...
Just ignore the VCL stuff and convert the AnsiString into any string or output you want ...
来源:https://stackoverflow.com/questions/64772061/given-the-runtime-data-how-to-know-if-sorting-program-uses-bubble-sort-or-inser