Pandas: calculate haversine distance within each group of rows

后端 未结 4 886
自闭症患者
自闭症患者 2020-12-18 14:26

The sample CSV is like this:

 user_id  lat         lon
    1   19.111841   72.910729
    1   19.111342   72.908387
    2   19.111542   72.907387
    2   19.1         


        
4条回答
  •  忘掉有多难
    2020-12-18 14:54

    Try this approach:

    import pandas as pd
    import numpy as np
    
    # parse CSV to DataFrame. You may want to specify the separator (`sep='...'`)
    df = pd.read_csv('/path/to/file.csv')
    
    # vectorized haversine function
    def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371):
        """
        slightly modified version: of http://stackoverflow.com/a/29546836/2901002
    
        Calculate the great circle distance between two points
        on the earth (specified in decimal degrees or in radians)
    
        All (lat, lon) coordinates must have numeric dtypes and be of equal length.
    
        """
        if to_radians:
            lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])
    
        a = np.sin((lat2-lat1)/2.0)**2 + \
            np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2
    
        return earth_radius * 2 * np.arcsin(np.sqrt(a))
    

    Now we can calculate distances between coordinates belonging to the same id (group):

    df['dist'] = \
        np.concatenate(df.groupby('id')
                         .apply(lambda x: haversine(x['lat'], x['lon'],
                                                    x['lat'].shift(), x['lon'].shift())).values)
    

    Result:

    In [105]: df
    Out[105]:
        id        lat        lon       dist
    0    1  19.111841  72.910729        NaN
    1    1  19.111342  72.908387   0.252243
    2    2  19.111542  72.907387        NaN
    3    2  19.137815  72.914085   3.004976
    4    2  19.119677  72.905081   2.227658
    5    2  19.129677  72.905081   1.111949
    6    3  19.319677  72.905081        NaN
    7    3  19.120217  72.907121  22.179974
    8    4  19.420217  72.807121        NaN
    9    4  19.520217  73.307121  53.584504
    10   5  19.319677  72.905081        NaN
    11   5  19.419677  72.805081  15.286775
    12   5  19.629677  72.705081  25.594890
    13   5  19.111860  72.911347  61.509917
    14   5  19.111860  72.931346   2.101215
    15   5  19.219677  72.605081  36.304756
    16   6  19.319677  72.805082        NaN
    17   6  19.419677  72.905086  15.287063
    

提交回复
热议问题