bisect.insort complexity not as expected

蓝咒 提交于 2019-12-10 14:21:40

问题


trying to find the most optimal data structure in python3 for a frotier problem i have to develop have just realised that the complexity of using the module bisect to make a real time ordered insert is not O(nlog n) as it should be and grows exponentially instead. do not know the reasoning of it so felt like asking u guys just in case know something about it since i find it really interesting.

think im using the module right so it shouldn't be a problem on my end, anyways here is the code used to insert node objects determining that insertion by a random f value nodes have.

bisect.insort(self._frontier, (node._f, node))

getting lots of objects in a few seconds, but then not that many over time. Bakuriu suggested me asking this question since he also found it interesting after doing some tests and ending up with same results as me. the code he used to test that out was the following:

python3 -m timeit -s 'import bisect as B; import random as R;seq=[]' 'for _ in range(100000):B.insort(seq, R.randint(0, 1000000))'

An these were his conclusions:

10k insertions is all fine (80ms and up to that point it basically scales linearly [keep in mind that it is O(nlog n) so it's a little bit worse than linear]) but with 100k it takes forever instead of 10 times more. A list of 100k elements isn't really that big and log(100k) is 16 so it's not that big.

any help will be much appreciated!


回答1:


You probably missed that the time complexity for insort is O(n) and this is documented clearly, for bisect.insort_left():

Keep in mind that the O(log n) search is dominated by the slow O(n) insertion step.

Finding the insertion point is cheap, but inserting into a Python list is not, as the elements past the insertion point have to be moved up a step.

Also see the TimeComplexity page on the Python Wiki, where list insertion is documented:

Insert O(n)

You can find the insertion point in O(log n) time, but the insertion step that follows is O(n), making this a rather expensive way to sort.

If you are using this to sort m elements, you have a O(m^2) (quadratic) solution for what should only take O(m log m) time with TimSort (the sorting algorithm used by the sorted() function).




回答2:


Binary search takes O(log n) comparisons, but insort isn't just a binary search. It also inserts the element, and inserting an element into a length-n list takes O(n) time.

The _frontier naming in your original code snippet suggests some sort of prioritized search algorithm. A heap would probably make more sense for that, or a SortedList from sortedcollections.



来源:https://stackoverflow.com/questions/53023380/bisect-insort-complexity-not-as-expected

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!