Distance matrix between two point layers

倖福魔咒の 提交于 2021-01-29 10:19:25

问题


I have two arrays containing point coordinates as shapely.geometry.Point with different sizes.

Eg:

[Point(X Y), Point(X Y)...]
[Point(X Y), Point(X Y)...]

I would like to create a "cross product" of these two arrays with a distance function. Distance function is from shapely.geometry, which is a simple geometry vector distance calculation. I am tryibg to create distance matrix between M:N points:

Right now I have this function:

    source = gpd.read_file(source)
    near = gpd.read_file(near)

    source_list = source.geometry.values.tolist()
    near_list = near.geometry.values.tolist()

    array = np.empty((len(source.ID_SOURCE), len(near.ID_NEAR)))

    for index_source, item_source in enumerate(source_list):
        for index_near, item_near in enumerate(near_list):
            array[index_source, index_near] = item_source.distance(item_near)

    df_matrix = pd.DataFrame(array, index=source.ID_SOURCE, columns = near.ID_NEAR)

Which does the job fine, but is slow. 4000 x 4000 points is around 100 seconds (I have datasets which are way bigger, so speed is main issue). I would like to avoid this double loop if possible. I tried to do in in pandas dataframe as in (which has terrible speed):

for index_source, item_source in source.iterrows():
         for index_near, item_near in near.iterrows():
             df_matrix.at[index_source, index_near] = item_source.geometry.distance(item_near.geometry)

A bit faster is (but still 4x slower than numpy):

    for index_source, item_source in enumerate(source_list):
        for index_near, item_near in enumerate(near_list):
             df_matrix.at[index_source, index_near] = item_source.distance(item_near)

Is there a faster way to do this? I guess there is, but I have no idea how to proceed. I might be able to chunk the dataframe into smaller pieces and send the chunk onto different core and concat the results - this is the last resort. If somehow we can use numpy only with some indexing only magic, I can send it to GPU and be done with it in no time. But the double for loop is a no no right now. Also I would like to not use any other library than Pandas/Numpy. I can use SAGA processing and its Point distances module (http://www.saga-gis.org/saga_tool_doc/2.2.2/shapes_points_3.html), which is pretty damn fast, but I am looking for Python only solution.


回答1:


If you can get the coordinates in separate vectors, I would try this:

import numpy as np

x = np.asarray([5.6, 2.1, 6.9, 3.1]) # Replace with data
y = np.asarray([7.2, 8.3, 0.5, 4.5]) # Replace with data

x_i = x[:, np.newaxis]
x_j = x[np.newaxis, :]

y_i = y[:, np.newaxis]
y_j = y[np.newaxis, :]

d = (x_i-x_j)**2+(y_i-y_j)**2

np.sqrt(d, out=d)


来源:https://stackoverflow.com/questions/58713739/distance-matrix-between-two-point-layers

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!