When does Big-O notation fail?

人走茶凉 提交于 2019-11-29 21:37:41

One area where Big O fails is memory access patterns. Big O only counts operations that need to be performed - it can't keep track if an algorithm results in more cache misses or data that needs to be paged in from disk. For small N, these effects will typically dominate. For instance, a linear search through an array of 100 integers will probably beat out a search through a binary tree of 100 integers due to memory accesses, even though the binary tree will most likely require fewer operations. Each tree node would result in a cache miss, whereas the linear search would mostly hit the cache for each lookup.

It fails in exactly one case: When people try to use it for something it's not meant for.

It tells you how an algorithm scales. It does not tell you how fast it is.

Big-O notation doesn't tell you which algorithm will be faster in any specific case. It only tells you that for sufficiently large input, one will be faster than the other.

When N is small, the constant factor dominates. Looking up an item in an array of five items is probably faster than looking it up in a hash table.

Short answer: When n is small. The Traveling Salesman Problem is quickly solved when you only have three destinations (however, finding the smallest number in a list of a trillion elements can last a while, although this is O(n). )

the canonical example is Quicksort, which has a worst time of O(n^2), while Heapsort's is O(n logn). in practice however, Quicksort is usually faster then Heapsort. why? two reasons:

  • each iteration in Quicksort is a lot simpler than Heapsort. Even more, it's easily optimized by simple cache strategies.

  • the worst case is very hard to hit.

But IMHO, this doesn't mean 'big O fails' in any way. the first factor (iteration time) is easy to incorporate into your estimates. after all, big O numbers should be multiplied by this almost-constant facto.

the second factor melts away if you get the amortized figures instead of average. They can be harder to estimate, but tell a more complete story

CoderDennis

Big-O describes the efficiency/complexity of the algorithm and not necessarily the running time of the implementation of a given block of code. This doesn't mean Big-O fails. It just means that it's not meant to predict running time.

Check out the answer to this question for a great definition of Big-O.

  1. For most algorithms there is an "average case" and a "worst case". If your data routinely falls into the "worst case" scenario, it is possible that another algorithm, while theoretically less efficient in the average case, might prove more efficient for your data.

  2. Some algorithms also have best cases that your data can take advantage of. For example, some sorting algorithms have a terrible theoretical efficiency, but are actually very fast if the data is already sorted (or nearly so). Another algorithm, while theoretically faster in the general case, may not take advantage of the fact that the data is already sorted and in practice perform worse.

  3. For very small data sets sometimes an algorithm that has a better theoretical efficiency may actually be less efficient because of a large "k" value.

One example (that I'm not an expert on) is that simplex algorithms for linear programming have exponential worst-case complexity on arbitrary inputs, even though they perform well in practice. An interesting solution to this is considering "smoothed complexity", which blends worst-case and average-case performance by looking at small random perturbations of arbitrary inputs.

Spielman and Teng (2004) were able to show that the shadow-vertex simplex algorithm has polynomial smoothed complexity.

This somewhat depends on what the Big-O is measuring - when it's worst case scenarios, it will usually "fail" in that the runtime performance will be much better than the Big-O suggests. If it's average case, then it may be much worse.

Big-O notation typically "fails" if the input data to the algorithm has some prior information. Often, the Big-O notation refers to the worst case complexity - which will often happen if the data is either completely random or completely non-random.

As an example, if you feed data to an algorithm that's profiled and the big-o is based on randomized data, but your data has a very well-defined structure, your result times may be much faster than expected. On the same token, if you're measuring average complexity, and you feed data that is horribly randomized, the algorithm may perform much worse than expected.

Big O does not say e.g. that algorithm A runs faster than algorithm B. It can say that the time or space used by algorithm A grows at a different rate than algorithm B, when the input grows. However, for any specific input size, big O notation does not say anything about the performance of one algorithm relative to another.

For example, A may be slower per operation, but have a better big-O than B. B is more performant for smaller input, but if the data size increases, there will be some cut-off point where A becomes faster. Big-O in itself does not say anything about where that cut-off point is.

  1. Small N - And for todays computers, 100 is likely too small to worry.
  2. Hidden Multipliers - IE merge vs quick sort.
  3. Pathological Cases - Again, merge vs quick

The general answer is that Big-O allows you to be really sloppy by hiding the constant factors. As mentioned in the question, the use of Fibonacci Heaps is one example. Fibonacci Heaps do have great asymptotic runtimes, but in practice the constants factors are way too large to be useful for the sizes of data sets encountered in real life.

Fibonacci Heaps are often used in proving a good lower bound for asymptotic complexity of graph-related algorithms.

Another similar example is the Coppersmith-Winograd algorithm for matrix multiplication. It is currently the algorithm with the fastest known asymptotic running time for matrix multiplication, O(n2.376). However, its constant factor is far too large to be useful in practice. Like Fibonacci Heaps, it's frequently used as a building block in other algorithms to prove theoretical time bounds.

One broad area where Big-Oh notation fails is when the amount of data exceeds the available amount of RAM.

Using sorting as an example, the amount of time it takes to sort is not dominated by the number of comparisons or swaps (of which there are O(n log n) and O(n), respectively, in the optimal case). The amount of time is dominated by the number of disk operations: block writes and block reads.

To better analyze algorithms which handle data in excess of available RAM, the I/O-model was born, where you count the number of disk reads. In that, you consider three parameters:

  • The number of elements, N;
  • The amount of memory (RAM), M (the number of elements that can be in memory); and
  • The size of a disk block, B (the number of elements per block).

Notably absent is the amount of disk space; this is treated as if it were infinite. A typical extra assumption is that M > B2.

Continuing the sorting example, you typically favor merge sort in the I/O case: divide the elements into chunks of size θ(M) and sort them in memory (with, say, quicksort). Then, merge θ(M/B) of them by reading the first block from each chunk into memory, stuff all the elements into a heap, and repeatedly pick the smallest element until you have picked B of them. Write this new merge block out and continue. If you ever deplete one of the blocks you read into memory, read a new block from the same chunk and put it into the heap.

(All expressions should be read as being big θ). You form N/M sorted chunks which you then merge. You merge log (base M/B) of N/M times; each time you read and write all the N/B blocks, so it takes you N/B * (log base M/B of N/M) time.

You can analyze in-memory sorting algorithms (suitably modified to include block reads and block writes) and see that they're much less efficient than the merge sort I've presented.

This knowledge is courtesy of my I/O-algorithms course, by Arge and Brodal (http://daimi.au.dk/~large/ioS08/); I also conducted experiments which validate the theory: heap sort takes "almost infinite" time once you exceed memory. Quick sort becomes unbearably slow, merge sort barely bearably slow, I/O-efficient merge sort performs well (the best of the bunch).

I've seen a few cases where, as the data set grew, the algorithmic complexity became less important than the memory access pattern. Navigating a large data structure with a smart algorithm can, in some cases, cause far more page faults or cache misses, than an algorithm with a worse big-O.

For small n, two algorithms may be comparable. As n increases, the smarter algorithm outperforms. But, at some point, n grows big enough that the system succumbs to memory pressure, in which case the "worse" algorithm may actually perform better because the constants are essentially reset.

This isn't particularly interesting, though. By the time you reach this inversion point, the performance of both algorithms is usually unacceptable, and you have to find a new algorithm that has a friendlier memory access pattern AND a better big-O complexity.

This question is like asking, "When does a person's IQ fail in practice?" It's clear that having a high IQ does not mean you'll be successful in life and having a low IQ does not mean you'll perish. Yet, we measure IQ as a means of assessing potential, even if its not an absolute.

In algorithms, the Big-Oh notation gives you the algorithm's IQ. It doesn't necessarily mean that the algorithm will perform best for your particular situation, but there's some mathematical basis that says this algorithm has some good potential. If Big-Oh notation were enough to measure performance you would see a lot more of it and less runtime testing.

Think of Big-Oh as a range instead of a specific measure of better-or-worse. There's best case scenarios and worst case scenarios and a huge set of scenarios in between. Choose your algorithms by how well they fit within the Big-Oh range, but don't rely on the notation as an absolute for measuring performance.

altCognito

When your data doesn't fit the model, big-o notation will still work, but you're going to see an overlap from best and worst case scenarios.

Also, some operations are tuned for linear data access vs. random data access, so one algorithm while superior in terms of cycles, might be doggedly slow if the method of calling it changes from design. Similarly, if an algorithm causes page/cache misses due to the way it access memory, Big-O isn't going to going to give an accurate estimate of the cost of running a process.

Apparently, as I've forgotten, also when N is small :)

The short answer: always on modern hardware when you start using a lot of memory. The textbooks assume memory access is uniform, and it is no longer. You can of course do Big O analysis for a non-uniform access model, but that is somewhat more complex.

The small n cases are obvious but not interesting: fast enough is fast enough.

In practice I've had problems using the standard collections in Delphi, Java, C# and Smalltalk with a few million objects. And with smaller ones where the dominant factor proved to be the hash function or the compare

Robert Sedgewick talks about shortcomings of the big-O notation in his Coursera course on Analysis of Algorithms. He calls particularly egregious examples galactic algorithms because while they may have a better complexity class than their predecessors, it would take inputs of astronomical sizes for it to show in practice.

https://www.cs.princeton.edu/~rs/talks/AlgsMasses.pdf

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!