I'm relatively new to python (using v3.x syntax) and would appreciate notes regarding complexity and performance of heapq vs. sorted.
I've already implemented a heapq based solution for a greedy 'find the best job schedule' algorithm. But then I've learned about the possibility of using 'sorted' together with operator.itemgetter() and reverse=True.
Sadly, I could not find any explanation on expected complexity and/or performance of 'sorted' vs. heapq.
If you use binary heap to pop all elements in order, the thing you do is basically heapsort. It is slower than sort algorightm in sorted function apart from it's implementation is pure python.
The heapq is faster than sorted in case if you need to add elements on the fly i.e. additions and insertions could come in unspecified order. Adding new element preserving inner order in any heap is faster than resorting array after each insertion.
The sorted is faster if you will need to retrieve all elements in order later.
The only problem where they can compete - if you need some portion of smallest (or largest) elements from collection. Although there are special algorigthms for that case, whether heapq or sorted will be faster here depends on the size of the initial array and portion you'll need to extract.
heapq is implemented as a binary heap,
The key things to note about binary heaps, and by extension, heapq:
- Searching is not supported
- Insertions are constant time on average
- Deletions are O(log n) time on average
Additional binary heap info described here: http://en.wikipedia.org/wiki/Binary_heap
While heapq is a data structure which has the properties of a binary heap, using sorted is a different concept. sorted returns a sorted list, so that's essentially a result, whereas the heapq is a data structure you are continually working with, which could, optionally, be sorted via sorted.
Additonal sorted info here: https://docs.python.org/3.4/library/functions.html#sorted
What specifically are you trying to accomplish?
Response to OP's comment:
Why do you think you need a heapq specifically? A binary heap is a specialized data structure, and depending on your requirements, it's quite likely not necessary.
You seem to be extremely concerned about performance, but it's not clear why. If something is a "bad performer", but its aggregate time is not significant, then it really doesn't matter in the bigger picture. In the aggregate case, a dict or a list would perform generally perform fine. Why do you specifically think a heapq is needed?
I wonder if this is a don't-let-the-perfect-be-the-enemy-of-the-good type of situation.
Writing Python using C extensions is a niche use case reserved for cases where performance is truly a significant issue. (i.e. it may be better to use, say, an XML parser that is a C extension than something that is pure Python if you're dealing with large files and if performance is your main concern).
Regarding In complex keep playing with structure case: could it be faster to sort with sorted and add elements via .append():
I'm still not clear what the use case is here. As I mentioned above, sorted and heapq are really two different concepts.
What is the use case for which you are so concerned about performance? (Absent other factors not yet specified, I think you may be overly emphasizing the importance of best-case performance in your code here.)
The nlargest() and nsmallest() functions of heapq are most appropriate if you are trying to find a relatively small number of items. If you want to find simply single smallest or largest number , min() and max() are most suitable, because it's faster and uses sorted and then slicing. If you are looking for the N smallest or largest items and N is small compared to the overall size of the collection, these functions provide superior performance. Although it's not necessary to use heapq in your code, it's just an interesting topic and a worthwhile subject of study.
来源:https://stackoverflow.com/questions/24666602/python-heapq-vs-sorted-complexity-and-performance