Python, Pairwise 'distance', need a fast way to do it

前端 未结 3 1446
渐次进展
渐次进展 2021-01-12 16:27

For a side project in my PhD, I engaged in the task of modelling some system in Python. Efficiency wise, my program hits a bottleneck in the following problem, which I\'ll e

3条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2021-01-12 16:30

    This is more of a meta answer, at least for starters. Your problem might already be in "my program hits a bottleneck" and "I realize this is extremely inefficient".

    Extremely inefficient? By what measure? Do you have comparison? Is your code too slow to finish in a reasonable amount of time? What is a reasonable amount of time for you? Can you throw more computing power at the problem? Equally important -- do you use a proper infrastructure to run your code on (numpy/scipy compiled with vendor compilers, possibly with OpenMP support)?

    Then, if you have answers for all of the questions above and need to further optimize your code -- where is the bottleneck in your current code exactly? Did you profile it? It the body of the loop possibly much more heavy-weight than the evaluation of the loop itself? If so, then "the loop" is not your bottleneck and you do not need to worry about the nested loop in the first place. Optimize the body at first, possibly by coming up with unorthodox matrix representations of your data so that you can perform all these single calculations in one step -- by matrix multiplication, for instance. If your problem is not solvable by efficient linear algebra operations, you can start writing a C extension or use Cython or use PyPy (which just very recently got some basic numpy support!). There are endless possibilities for optimizing -- the questions really are: how close to a practical solution are you already, how much do you need to optimize, and how much of an effort are you willing to invest.

    Disclaimer: I have done non-canonical pairwise-distance stuff with scipy/numpy for my PhD, too ;-). For one particular distance metric, I ended up coding the "pairwise" part in simple Python (i.e. I also used the doubly-nested loop), but spent some effort in getting the body as efficient as possible (with a combination of i) a cryptical matrix multiplication representation of my problem and ii) using bottleneck).

提交回复
热议问题