From pandas dataframe to tuples (for haversine module)

北城余情 提交于 2020-01-15 06:49:30

问题


I have a pandas dataframe my_df with the following columns :

id  lat1 lon1 lat2 lon2
1   45   0    41   3
2   40   1    42   4
3   42   2    37   1

Basically, I'd like to do the following :

import haversine

haversine.haversine((45, 0), (41, 3)) # just to show syntax of haversine()
> 507.20410687342115

# what I'd like to do
my_df["dist"] = haversine.haversine((my_df["lat1"], my_df["lon1"]),(my_df["lat2"], my_df["lon2"]))

TypeError: cannot convert the series to < class 'float' >

Using this, I tried the following :

my_df['dist'] = haversine.haversine(
        list(zip(*[my_df[['lat1','lon1']][c].values.tolist() for c in my_df[['lat1','lon1']]]))
        , 
        list(zip(*[my_df[['lat2','lon2']][c].values.tolist() for c in my_df[['lat2','lon2']]]))
        )

File "blabla\lib\site-packages\haversine__init__.py", line 20, in haversine lat1, lng1 = point1

ValueError: too many values to unpack (expected 2)

Any idea of what I'm doing wrong / how I can achieve what I want ?


回答1:


Use apply with axis=1:

my_df["dist"] = my_df.apply(lambda row : haversine.haversine((row["lat1"], row["lon1"]),(row["lat2"], row["lon2"])), axis=1)

To call the haversine function on each row, the function understands scalar values, not array like values hence the error. By calling apply with axis=1, you iterate row-wise so we can then access each column value and pass these in the form that the method expects.

Also I don't know what the difference is but there is a vectorised version of the haversine formula




回答2:


What about using a vectorized approach:

import pandas as pd

# vectorized haversine function
def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371):
    """
    slightly modified version: of http://stackoverflow.com/a/29546836/2901002

    Calculate the great circle distance between two points
    on the earth (specified in decimal degrees or in radians)

    All (lat, lon) coordinates must have numeric dtypes and be of equal length.

    """
    if to_radians:
        lat1, lon1, lat2, lon2 = pd.np.radians([lat1, lon1, lat2, lon2])

    a = pd.np.sin((lat2-lat1)/2.0)**2 + \
        pd.np.cos(lat1) * pd.np.cos(lat2) * pd.np.sin((lon2-lon1)/2.0)**2

    return earth_radius * 2 * pd.np.arcsin(np.sqrt(a))

Demo:

In [38]: df
Out[38]:
   id  lat1  lon1  lat2  lon2
0   1    45     0    41     3
1   2    40     1    42     4
2   3    42     2    37     1

In [39]: df['dist'] = haversine(df.lat1, df.lon1, df.lat2, df.lon2)

In [40]: df
Out[40]:
   id  lat1  lon1  lat2  lon2        dist
0   1    45     0    41     3  507.204107
1   2    40     1    42     4  335.876312
2   3    42     2    37     1  562.543582


来源:https://stackoverflow.com/questions/45054067/from-pandas-dataframe-to-tuples-for-haversine-module

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!