Spherical k-means implementation in Python

笑着哭i 提交于 2019-12-04 02:07:10

In spherical k-means, you aim to guarantee that the centers are on the sphere, so you could adjust the algorithm to use the cosine distance, and should additionally normalize the centroids of the final result.

When using the Euclidean distance, I prefer to think of the algorithm as projecting the cluster centers onto the unit sphere in each iteration, i.e., the centers should be normalized after each maximization step.

Indeed, when the centers and data points are both normalized, there is a 1-to-1 relationship between the cosine distance and Euclidean distance

|a - b|_2 = 2 * (1 - cos(a,b))

The package jasonlaska/spherecluster modifies scikit-learns's k-means into spherical k-means and also provides another sphere clustering algorithm.

Hooked

It looks like the salient feature in the spherical k-means is the use of the cosine distance, instead of the standard Euclidean metric. With that being said, there is a nice pure numpy/scipy adaptation here on SO in another answer:

Is it possible to specify your own distance function using Scikits.Learn K-Means Clustering?

If that doesn't meet what you are looking for you might want to try sklearn.cluster.

Here's how you do it if you have polar coordinates on a 3D sphere, such as (lat, lon) pairs:

  1. If your coordinates are (lat, lon) coordinates measured in degrees you can write a function that converts these points into cartesian coordinates, like:

    def cartesian_encoder(coord, r_E=6371):
        """Convert lat/lon to cartesian points on Earth's surface.
    
        Input
        -----
            coord : numpy 2darray (size=(N, 2))
            r_E : radius of Earth
    
        Output
        ------
            out : numpy 2darray (size=(N, 3))
        """
        def _to_rad(deg):
            return deg * np.pi / 180.
    
        theta = _to_rad(coord[:, 0])  # lat [radians]
        phi = _to_rad(coord[:, 1])    # lon [radians]
    
        x = r_E * np.cos(phi) * np.cos(theta)
        y = r_E * np.sin(phi) * np.cos(theta)
        z = r_E * np.sin(theta)
    
        return np.concatenate([x.reshape(-1, 1), y.reshape(-1, 1), z.reshape(-1, 1)], axis=1)
    

    If your coordinates are already in radians, just remove the first 5 lines in that function.

  2. Install the spherecluster package with pip. If your polar data given as rows of (lat, lon) pairs is called X and you want to find 10 cluster in it, the final code for KMeans-clustering spherically will be:

    import numpy as np
    import spherecluster
    
    X_cart = cartesian_encoder(X)
    kmeans_labels = SphericalKMeans(10).fit_predict(X_cart)
    
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!