Best way to sort 1M records in Python

前端 未结 11 1638
慢半拍i
慢半拍i 2020-12-05 08:41

I have a service that runs that takes a list of about 1,000,000 dictionaries and does the following

myHashTable = {}
myLists = { \'hits\':{}, \'misses\':{},          


        
11条回答
  •  心在旅途
    2020-12-05 09:40

    What you really want is an ordered container, instead of an unordered one. That would implicitly sort the results as they're inserted. The standard data structure for this is a tree.

    However, there doesn't seem to be one of these in Python. I can't explain that; this is a core, fundamental data type in any language. Python's dict and set are both unordered containers, which map to the basic data structure of a hash table. It should definitely have an optimized tree data structure; there are many things you can do with them that are impossible with a hash table, and they're quite tricky to implement well, so people generally don't want to be doing it themselves.

    (There's also nothing mapping to a linked list, which also should be a core data type. No, a deque is not equivalent.)

    I don't have an existing ordered container implementation to point you to (and it should probably be implemented natively, not in Python), but hopefully this will point you in the right direction.

    A good tree implementation should support iterating across a range by value ("iterate all values from [2,100] in order"), find next/prev value from any other node in O(1), efficient range extraction ("delete all values in [2,100] and return them in a new tree"), etc. If anyone has a well-optimized data structure like this for Python, I'd love to know about it. (Not all operations fit nicely in Python's data model; for example, to get next/prev value from another value, you need a reference to a node, not the value itself.)

提交回复
热议问题