Find a easier way to cluster 2-d scatter data into grid array data

大憨熊 提交于 2019-12-06 05:06:40

There are many for-loop in your code, it's not the numpy way.

Make some sample data first:

import numpy as np
import pandas as pd
from scipy.spatial import KDTree
import pylab as pl

xc1, xc2, yc1, yc2 = 113.49805889531724, 115.5030664238035, 37.39995194888143, 38.789235929357105       

N = 1000
GSIZE = 20
x, y = np.random.multivariate_normal([(xc1 + xc2)*0.5, (yc1 + yc2)*0.5], [[0.1, 0.02], [0.02, 0.1]], size=N).T
value = np.ones(N)

df_points = pd.DataFrame({"x":x, "y":y, "v":value})

For equal space grids you can use hist2d():

pl.hist2d(df_points.x, df_points.y, weights=df_points.v, bins=20, cmap="viridis");

Here is the output:

Here is the code to use KdTree:

X, Y = np.mgrid[x.min():x.max():GSIZE*1j, y.min():y.max():GSIZE*1j]

grid = np.c_[X.ravel(), Y.ravel()]
points = np.c_[df_points.x, df_points.y]

tree = KDTree(grid)
dist, indices = tree.query(points)

grid_values = df_points.groupby(indices).v.sum()

df_grid = pd.DataFrame(grid, columns=["x", "y"])
df_grid["v"] = grid_values

fig, ax = pl.subplots(figsize=(10, 8))
ax.plot(df_points.x, df_points.y, "kx", alpha=0.2)
mapper = ax.scatter(df_grid.x, df_grid.y, c=df_grid.v, 
                    cmap="viridis", 
                    linewidths=0, 
                    s=100, marker="o")
pl.colorbar(mapper, ax=ax);

the output is:

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!