Clustering 500,000 geospatial points in python

前端 未结 2 578
無奈伤痛
無奈伤痛 2020-12-15 12:02

I\'m currently faced with the problem of finding a way to cluster around 500,000 latitude/longitude pairs in python. So far I\'ve tried computing a distance matrix with nump

2条回答
  •  自闭症患者
    2020-12-15 12:26

    I don't have your data so I just generated 500k random numbers into three columns.

    import numpy as np
    import matplotlib.pyplot as plt
    from scipy.cluster.vq import kmeans2, whiten
    
    arr = np.random.randn(500000*3).reshape((500000, 3))
    x, y = kmeans2(whiten(arr), 7, iter = 20)  #<--- I randomly picked 7 clusters
    plt.scatter(arr[:,0], arr[:,1], c=y, alpha=0.33333);
    
    out[1]:
    

    enter image description here

    I timed this and it took 1.96 seconds to run this Kmeans2 so I don't think it has to do with the size of your data. Put your data in a 500000 x 3 numpy array and try kmeans2.

提交回复
热议问题