I\'m currently faced with the problem of finding a way to cluster around 500,000 latitude/longitude pairs in python. So far I\'ve tried computing a distance matrix with nump
I don't have your data so I just generated 500k random numbers into three columns.
import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.vq import kmeans2, whiten
arr = np.random.randn(500000*3).reshape((500000, 3))
x, y = kmeans2(whiten(arr), 7, iter = 20) #<--- I randomly picked 7 clusters
plt.scatter(arr[:,0], arr[:,1], c=y, alpha=0.33333);
out[1]:
I timed this and it took 1.96 seconds to run this Kmeans2 so I don't think it has to do with the size of your data. Put your data in a 500000 x 3 numpy array and try kmeans2.