Vectorizing Haversine distance calculation in Python

后端 未结 3 1495
傲寒
傲寒 2020-11-29 11:15

I am trying to calculate a distance matrix for a long list of locations identified by Latitude & Longitude using the Haversine formula that takes two tuples of coordinat

3条回答
  •  没有蜡笔的小新
    2020-11-29 12:06

    You would provide your function as an argument to np.vectorize(), and could then use it as an argument to pandas.groupby.apply as illustrated below:

    haver_vec = np.vectorize(haversine, otypes=[np.int16])
    distance = df.groupby('id').apply(lambda x: pd.Series(haver_vec(df.coordinates, x.coordinates)))
    

    For instance, with sample data as follows:

    length = 500
    df = pd.DataFrame({'id':np.arange(length), 'coordinates':tuple(zip(np.random.uniform(-90, 90, length), np.random.uniform(-180, 180, length)))})
    

    compare for 500 points:

    def haver_vect(data):
        distance = data.groupby('id').apply(lambda x: pd.Series(haver_vec(data.coordinates, x.coordinates)))
        return distance
    
    %timeit haver_loop(df): 1 loops, best of 3: 35.5 s per loop
    
    %timeit haver_vect(df): 1 loops, best of 3: 593 ms per loop
    

提交回复
热议问题