Pandas: calculate haversine distance within each group of rows

后端 未结 4 873
自闭症患者
自闭症患者 2020-12-18 14:26

The sample CSV is like this:

 user_id  lat         lon
    1   19.111841   72.910729
    1   19.111342   72.908387
    2   19.111542   72.907387
    2   19.1         


        
4条回答
  •  不思量自难忘°
    2020-12-18 14:36

    You just need a working data structure, dict of lists and lat/lon as tuples. Quickly prototyped it could look like this:

    from haversine import haversine  # pip3 install haversine
    from collections import defaultdict
    
    csv = """
    1   19.111841   72.910729
    1   19.111342   72.908387
    2   19.111342   72.908387
    2   19.137815   72.914085
    2   19.119677   72.905081
    2   19.119677   72.905081
    3   19.119677   72.905081
    3   19.120217   72.907121
    5   19.119677   72.905081
    5   19.119677   72.905081
    5   19.119677   72.905081
    5   19.111860   72.911346
    5   19.111860   72.911346
    5   19.119677   72.905081
    6   19.119677   72.905081
    6   19.119677   72.905081
    """
    
    d = defaultdict(list)  # data structure !
    
    for line in csv.splitlines():
        line = line.strip()  # remove whitespaces
    
        if not line:
            continue  # skip empty lines
    
        cId, lat, lon = line.split('   ')
        d[cId].append((float(lat), float(lon)))
    
    for k, v in d.items():
        print ('Distance for id: ', k, haversine(v[0], v[1]))
    

    returns:

    Distance for id:  1 0.2522433072207346
    Distance for id:  2 3.0039140173887557
    Distance for id:  3 0.22257643412844885
    Distance for id:  5 0.0
    Distance for id:  6 0.0
    

提交回复
热议问题