Vectorizing Haversine distance calculation in Python

后端未结

关注

 3  1495

傲寒 2020-11-29 11:15

I am trying to calculate a distance matrix for a long list of locations identified by Latitude & Longitude using the Haversine formula that takes two tuples of coordinat

3条回答

没有蜡笔的小新 (楼主)

2020-11-29 12:06

You would provide your function as an argument to np.vectorize(), and could then use it as an argument to pandas.groupby.apply as illustrated below:

haver_vec = np.vectorize(haversine, otypes=[np.int16])
distance = df.groupby('id').apply(lambda x: pd.Series(haver_vec(df.coordinates, x.coordinates)))

For instance, with sample data as follows:

length = 500
df = pd.DataFrame({'id':np.arange(length), 'coordinates':tuple(zip(np.random.uniform(-90, 90, length), np.random.uniform(-180, 180, length)))})

compare for 500 points:

def haver_vect(data):
    distance = data.groupby('id').apply(lambda x: pd.Series(haver_vec(data.coordinates, x.coordinates)))
    return distance

%timeit haver_loop(df): 1 loops, best of 3: 35.5 s per loop

%timeit haver_vect(df): 1 loops, best of 3: 593 ms per loop

0 讨论(0)

查看其它3个回答