Fast calculation of Pareto front in Python

后端 未结 5 2173
鱼传尺愫
鱼传尺愫 2020-12-13 03:13

I have a set of points in a 3D space, from which I need to find the Pareto frontier. Speed of execution is very important here, and time increases very fast as I add points

5条回答
  •  情书的邮戳
    2020-12-13 03:30

    If you're worried about actual speed, you definitely want to use numpy (as the clever algorithmic tweaks probably have way less effect than the gains to be had from using array operations). Here are three solutions that all compute the same function. The is_pareto_efficient_dumb solution is slower in most situations but becomes faster as the number of costs increases, the is_pareto_efficient_simple solution is much more efficient than the dumb solution for many points, and the final is_pareto_efficient function is less readable but the fastest (so all are Pareto Efficient!).

    import numpy as np
    
    
    # Very slow for many datapoints.  Fastest for many costs, most readable
    def is_pareto_efficient_dumb(costs):
        """
        Find the pareto-efficient points
        :param costs: An (n_points, n_costs) array
        :return: A (n_points, ) boolean array, indicating whether each point is Pareto efficient
        """
        is_efficient = np.ones(costs.shape[0], dtype = bool)
        for i, c in enumerate(costs):
            is_efficient[i] = np.all(np.any(costs[:i]>c, axis=1)) and np.all(np.any(costs[i+1:]>c, axis=1))
        return is_efficient
    
    
    # Fairly fast for many datapoints, less fast for many costs, somewhat readable
    def is_pareto_efficient_simple(costs):
        """
        Find the pareto-efficient points
        :param costs: An (n_points, n_costs) array
        :return: A (n_points, ) boolean array, indicating whether each point is Pareto efficient
        """
        is_efficient = np.ones(costs.shape[0], dtype = bool)
        for i, c in enumerate(costs):
            if is_efficient[i]:
                is_efficient[is_efficient] = np.any(costs[is_efficient]

    Profiling tests (using points drawn from a Normal distribution):

    With 10,000 samples, 2 costs:

    is_pareto_efficient_dumb: Elapsed time is 1.586s
    is_pareto_efficient_simple: Elapsed time is 0.009653s
    is_pareto_efficient: Elapsed time is 0.005479s
    

    With 1,000,000 samples, 2 costs:

    is_pareto_efficient_dumb: Really, really, slow
    is_pareto_efficient_simple: Elapsed time is 1.174s
    is_pareto_efficient: Elapsed time is 0.4033s
    

    With 10,000 samples, 15 costs:

    is_pareto_efficient_dumb: Elapsed time is 4.019s
    is_pareto_efficient_simple: Elapsed time is 6.466s
    is_pareto_efficient: Elapsed time is 6.41s
    

    Note that if efficiency is a concern you can gain maybe a further 2x or so speedup by reordering your data beforehand, see here.

提交回复
热议问题