Python heapq vs. sorted complexity and performance

﹥>﹥吖頭↗ 提交于 2019-11-28 23:39:09

If you use binary heap to pop all elements in order, the thing you do is basically heapsort. It is slower than sort algorightm in sorted function apart from it's implementation is pure python.

The heapq is faster than sorted in case if you need to add elements on the fly i.e. additions and insertions could come in unspecified order. Adding new element preserving inner order in any heap is faster than resorting array after each insertion.

The sorted is faster if you will need to retrieve all elements in order later.

The only problem where they can compete - if you need some portion of smallest (or largest) elements from collection. Although there are special algorigthms for that case, whether heapq or sorted will be faster here depends on the size of the initial array and portion you'll need to extract.

heapq is implemented as a binary heap, The key things to note about binary heaps, and by extension, heapq:

  1. Searching is not supported
  2. Insertions are constant time on average
  3. Deletions are O(log n) time on average

Additional binary heap info described here: http://en.wikipedia.org/wiki/Binary_heap

While heapq is a data structure which has the properties of a binary heap, using sorted is a different concept. sorted returns a sorted list, so that's essentially a result, whereas the heapq is a data structure you are continually working with, which could, optionally, be sorted via sorted.

Additonal sorted info here: https://docs.python.org/3.4/library/functions.html#sorted

What specifically are you trying to accomplish?

Response to OP's comment:

Why do you think you need a heapq specifically? A binary heap is a specialized data structure, and depending on your requirements, it's quite likely not necessary.

You seem to be extremely concerned about performance, but it's not clear why. If something is a "bad performer", but its aggregate time is not significant, then it really doesn't matter in the bigger picture. In the aggregate case, a dict or a list would perform generally perform fine. Why do you specifically think a heapq is needed?

I wonder if this is a don't-let-the-perfect-be-the-enemy-of-the-good type of situation.

Writing Python using C extensions is a niche use case reserved for cases where performance is truly a significant issue. (i.e. it may be better to use, say, an XML parser that is a C extension than something that is pure Python if you're dealing with large files and if performance is your main concern).

Regarding In complex keep playing with structure case: could it be faster to sort with sorted and add elements via .append():

I'm still not clear what the use case is here. As I mentioned above, sorted and heapq are really two different concepts.

What is the use case for which you are so concerned about performance? (Absent other factors not yet specified, I think you may be overly emphasizing the importance of best-case performance in your code here.)

The nlargest() and nsmallest() functions of heapq are most appropriate if you are trying to find a relatively small number of items. If you want to find simply single smallest or largest number , min() and max() are most suitable, because it's faster and uses sorted and then slicing. If you are looking for the N smallest or largest items and N is small compared to the overall size of the collection, these functions provide superior performance. Although it's not necessary to use heapq in your code, it's just an interesting topic and a worthwhile subject of study.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!