O(log N) == O(1) - Why not?

后端 未结 23 1985
广开言路
广开言路 2020-12-22 18:17

Whenever I consider algorithms/data structures I tend to replace the log(N) parts by constants. Oh, I know log(N) diverges - but does it matter in real world applications?

相关标签:
23条回答
  • 2020-12-22 18:43

    You asked for a real-world example. I'll give you one. Computational biology. One strand of DNA encoded in ASCII is somewhere on the level of gigabytes in space. A typical database will obviously have many thousands of such strands.

    Now, in the case of an indexing/searching algorithm, that log(n) multiple makes a large difference when coupled with constants. The reason why? This is one of the applications where the size of your input is astronomical. Additionally, the input size will always continue to grow.

    Admittedly, these type of problems are rare. There are only so many applications this large. In those circumstances, though... it makes a world of difference.

    0 讨论(0)
  • 2020-12-22 18:45

    This is a common mistake - remember Big O notation is NOT telling you about the absolute performance of an algorithm at a given value, it's simply telling you the behavior of an algorithm as you increase the size of the input.

    When you take it in that context it becomes clear why an algorithm A ~ O(logN) and an algorithm B ~ O(1) algorithm are different:

    if I run A on an input of size a, then on an input of size 1000000*a, I can expect the second input to take log(1,000,000) times as long as the first input

    if I run B on an input of size a, then on an input of size 1000000*a, I can expect the second input to take about the same amount of time as the first input

    EDIT: Thinking over your question some more, I do think there's some wisdom to be had in it. While I would never say it's correct to say O(lgN) == O(1), It IS possible that an O(lgN) algorithm might be used over an O(1) algorithm. This draws back to the point about absolute performance above: Just knowing one algorithm is O(1) and another algorithm is O(lgN) is NOT enough to declare you should use the O(1) over the O(lgN), it's certainly possible given your range of possible inputs an O(lgN) might serve you best.

    0 讨论(0)
  • 2020-12-22 18:45

    You might be interested in Soft-O, which ignores logarithmic cost. Check this paragraph in Wikipedia.

    0 讨论(0)
  • 2020-12-22 18:46

    I do not believe algorithms where you can freely choose between O(1) with a large constant and O(logN) really exists. If there is N elements to work with at the beginning, it is just plain impossible to make it O(1), the only thing that is possible is move your N to some other part of your code.

    What I try to say is that in all real cases I know off you have some space/time tradeoff, or some pre-treatment such as compiling data to a more efficient form.

    That is, you do not really go O(1), you just move the N part elsewhere. Either you exchange performance of some part of your code with some memory amount either you exchange performance of one part of your algorithm with another one. To stay sane you should always look at the larger picture.

    My point is that if you have N items they can't disappear. In other words you can choose between inefficient O(n^2) algorithms or worse and O(n.logN) : it's a real choice. But you never really go O(1).

    What I try to point out is that for every problem and initial data state there is a 'best' algorithm. You can do worse but never better. With some experience you can have a good guessing of what is this intrisic complexity. Then if your overall treatment match that complexity you know you have something. You won't be able to reduce that complexity, but only to move it around.

    If problem is O(n) it won't become O(logN) or O(1), you'll merely add some pre-treatment such that the overall complexity is unchanged or worse, and potentially a later step will be improved. Say you want the smaller element of an array, you can search in O(N) or sort the array using any common O(NLogN) sort treatment then have the first using O(1).

    Is it a good idea to do that casually ? Only if your problem asked also for second, third, etc. elements. Then your initial problem was truly O(NLogN), not O(N).

    And it's not the same if you wait ten times or twenty times longer for your result because you simplified saying O(1) = O(LogN).

    I'm waiting for a counter-example ;-) that is any real case where you have choice between O(1) and O(LogN) and where every O(LogN) step won't compare to the O(1). All you can do is take a worse algorithm instead of the natural one or move some heavy treatment to some other part of the larger pictures (pre-computing results, using storage space, etc.)

    0 讨论(0)
  • 2020-12-22 18:48

    I think this is a pragmatic approach; O(logN) will never be more than 64. In practice, whenever terms get as 'small' as O(logN), you have to measure to see if the constant factors win out. See also

    Uses of Ackermann function?

    To quote myself from comments on another answer:

    [Big-Oh] 'Analysis' only matters for factors that are at least O(N). For any smaller factor, big-oh analysis is useless and you must measure.

    and

    "With O(logN) your input size does matter." This is the whole point of the question. Of course it matters... in theory. The question the OP asks is, does it matter in practice? I contend that the answer is no, there is not, and never will be, a data set for which logN will grow so fast as to always be beaten a constant-time algorithm. Even for the largest practical dataset imaginable in the lifetimes of our grandchildren, a logN algorithm has a fair chance of beating a constant time algorithm - you must always measure.

    EDIT

    A good talk:

    http://www.infoq.com/presentations/Value-Identity-State-Rich-Hickey

    about halfway through, Rich discusses Clojure's hash tries, which are clearly O(logN), but the base of the logarithm is large and so the depth of the trie is at most 6 even if it contains 4 billion values. Here "6" is still an O(logN) value, but it is an incredibly small value, and so choosing to discard this awesome data structure because "I really need O(1)" is a foolish thing to do. This emphasizes how most of the other answers to this question are simply wrong from the perspective of the pragmatist who wants their algorithm to "run fast" and "scale well", regardless of what the "theory" says.

    EDIT

    See also

    http://queue.acm.org/detail.cfm?id=1814327

    which says

    What good is an O(log2(n)) algorithm if those operations cause page faults and slow disk operations? For most relevant datasets an O(n) or even an O(n^2) algorithm, which avoids page faults, will run circles around it.

    (but go read the article for context).

    0 讨论(0)
  • 2020-12-22 18:49

    For small enough N, O(N^N) can in practice be replaced with 1. Not O(1) (by definition), but for N=2 you can see it as one operation with 4 parts, or a constant-time operation.

    What if all operations take 1hour? The difference between O(log N) and O(1) is then large, even with small N.

    Or if you need to run the algorithm ten million times? Ok, that took 30minutes, so when I run it on a dataset a hundred times as large it should still take 30minutes because O(logN) is "the same" as O(1).... eh...what?

    Your statement that "I understand O(f(N))" is clearly false.

    Real world applications, oh... I don't know.... EVERY USE OF O()-notation EVER?

    Binary search in sorted list of 10 million items for example. It's the very REASON we use hash tables when the data gets big enough. If you think O(logN) is the same as O(1), then why would you EVER use a hash instead of a binary tree?

    0 讨论(0)
提交回复
热议问题