Idiomatic way to do list/dict in Cython?

后端 未结 6 2124
别那么骄傲
别那么骄傲 2020-12-23 17:00

My problem: I\'ve found that processing large data sets with raw C++ using the STL map and vector can often be considerably faster (and with lower memory footprint) than usi

6条回答
  •  天涯浪人
    2020-12-23 17:33

    C++ is fast not just because of the static declarations of the vector and the elements that go into it, but crucially because using templates/generics one specifies that the vector will only contain elements of a certain type, e.g. vector with tuples of three elements. Cython can't do this last thing and it sounds nontrivial -- it would have to be enforced at compile time, somehow (typechecking at runtime is what Python already does). So right now when you pop something off a list in Cython there is no way of knowing in advance what type it is , and putting it in a typed variable only adds a typecheck, not speed. This means that there is no way of bypassing the Python interpreter in this regard, and it seems to me it's the most crucial shortcoming of Cython for non-numerical tasks.

    The manual way of solving this is to subclass the python list/dict (or perhaps std::vector) with a cdef class for a specific type of element or key-value combination. This would amount to the same thing as the code that templates are generating. As long as you use the resulting class in Cython code it should provide an improvement.

    Using databases or arrays just solves a different problem, because this is about putting arbitrary objects (but with a specific type, and preferably a cdef class) in containers.

    And std::map shouldn't be compared to dict; std::map maintains keys in sorted order because it is a balanced tree, dict solves a different problem. A better comparison would be dict and Google's hashtable.

提交回复
热议问题