Calculate Distance to Nearest Feature with Geopandas

后端 未结 3 1420
难免孤独
难免孤独 2020-12-13 22:57

I\'m looking to do the equivalent of the ArcPy Generate Near Table using Geopandas / Shapely. I\'m very new to Geopandas and Shapely and have developed a methodology that wo

相关标签:
3条回答
  • 2020-12-13 23:22

    Your code is missing a detail, args = (df_lines)

    def min_distance(point, lines):
        return lines.distance(point).min()
    
    df_points['min_dist_to_lines'] = df_points.geometry.apply(min_distance, args=(df_lines,))# Notice the change to this line
    
    0 讨论(0)
  • 2020-12-13 23:23

    If I understand correctly your issue, Blocks and Restaurants can have very different dimensions. For this reason, it's probably a bad approach to try to force into a table format by reindexing.

    I would just loop over blocks and get the minimum distance to restaurants (just as @shongololo was suggesting).

    I'm going to be slightly more general (because I already have this code written down) and do a distance from points to lines, but the same code should work from points to points or from polygons to polygons. I'll start with a GeoDataFrame for the points and I'll create a new column which has the minimum distance to lines.

    %matplotlib inline
    import matplotlib.pyplot as plt
    import shapely.geometry as geom
    import numpy as np
    import pandas as pd
    import geopandas as gpd
    
    lines = gpd.GeoSeries(
        [geom.LineString(((1.4, 3), (0, 0))),
            geom.LineString(((1.1, 2.), (0.1, 0.4))),
            geom.LineString(((-0.1, 3.), (1, 2.)))])
    
    # 10 points
    n  = 10
    points = gpd.GeoSeries([geom.Point(x, y) for x, y in np.random.uniform(0, 3, (n, 2))])
    
    # Put the points in a dataframe, with some other random column
    df_points = gpd.GeoDataFrame(np.array([points, np.random.randn(n)]).T)
    df_points.columns = ['Geometry', 'Property1']
    
    points.plot()
    lines.plot()
    

    Now get the distance from points to lines and only save the minimum distance for each point (see below for a version with apply)

    min_dist = np.empty(n)
    for i, point in enumerate(points):
        min_dist[i] = np.min([point.distance(line) for line in lines])
    df_points['min_dist_to_lines'] = min_dist
    df_points.head(3)
    

    which gives

        Geometry                                       Property1    min_dist_to_lines
    0   POINT (0.2479424516236574 2.944916965334865)    2.621823    0.193293
    1   POINT (1.465768457667432 2.605673714922998)     0.6074484   0.226353
    2   POINT (2.831645235202689 1.125073838462032)     0.657191    1.940127
    

    ---- EDIT ----

    (taken from a github issue) Using apply is nicer and more consistent with how you'd do it in pandas:

    def min_distance(point, lines):
        return lines.distance(point).min()
    
    df_points['min_dist_to_lines'] = df_points.geometry.apply(min_distance, df_lines)
    

    EDIT: As of at least 2019-10-04 it seems that a change in pandas requires a different input in the last code block, making use of the args parameters in .apply():

    df_points['min_dist_to_lines'] = df_points.geometry.apply(min_distance, args=(df_lines,))
    
    0 讨论(0)
  • 2020-12-13 23:27

    I will use two sample datasets in geopandas with different dimensions to demonstrate.

    import geopandas as gpd
    
    # read geodata for five nyc boroughs
    gdf_nyc = gpd.read_file(gpd.datasets.get_path('nybb'))
    # read geodata for international cities
    gdf_cities = gpd.read_file(gpd.datasets.get_path('naturalearth_cities'))
    
    # convert to a meter projection
    gdf_nyc.to_crs(epsg=3857, inplace=True)
    gdf_cities.to_crs(epsg=3857, inplace=True)
    

    We can simply apply a lambda function to the GeoSeries. For example, if we want to get the minimal distance between each NYC borough (polygon) and their nearest international city (point). We can do the following:

    gdf_nyc.geometry.apply(lambda x: gdf_cities.distance(x).min())
    

    This will give us

    0    384422.953323
    1    416185.725507
    2    412520.308816
    3    419511.323677
    4    440292.945096
    Name: geometry, dtype: float64
    

    Similarly, if we want the minimal distance between each international city and their nearest NYC borough. We can do the following:

    gdf_cities.geometry.apply(lambda x: gdf_nyc.distance(x).min())
    

    This will give us

    0      9.592104e+06
    1      9.601345e+06
    2      9.316354e+06
    3      8.996945e+06
    4      2.614927e+07
               ...     
    197    1.177410e+07
    198    2.377188e+07
    199    8.559704e+06
    200    8.902146e+06
    201    2.034579e+07
    Name: geometry, Length: 202, dtype: float64
    

    Notes:

    1. Before calculating distance, covert your GeoDataFrame to a Cartesian projection. In the example, I used epsg:3857, so the distance will be in meters. If you use an ellipsoidal (lon/lat based) projection, the result will be degrees. Converting your projection first before anything else such as getting the centroids of your polygons.
    2. There is only one distance between two points. The minimal distance returned by the .distance() method will make sense when you want to get the distance, let say, between a point and a line. In other words, .distance() method can calculate distance between any two geo-objects.
    3. When you have more than one geometry columns in a GeoDataFrame, make sure to apply the lambda function to the desired GeoSeries and also call the .distance() method from the desired GeoSeries. In the example, I called the method from the GeoDataFrame directly because both of them only have one GeoSeries column.
    0 讨论(0)
提交回复
热议问题