Spherical k-means implementation in Python

十年热恋 提交于 2019-12-05 16:51:22

问题


I've been using scipy's k-means for quite some time now, and I'm pretty happy about the way it works in terms of usability and efficiency. However, now I want to explore different k-means variants, more specifically, I'd like to apply spherical k-means in some of my problems.

Do you know any good Python implementation (i.e. similar to scipy's k-means) of spherical k-means? If not, how hard would it be to modify scipy's source code to adapt its k-means algorithm to be spherical?

Thank you.


回答1:


In spherical k-means, you aim to guarantee that the centers are on the sphere, so you could adjust the algorithm to use the cosine distance, and should additionally normalize the centroids of the final result.

When using the Euclidean distance, I prefer to think of the algorithm as projecting the cluster centers onto the unit sphere in each iteration, i.e., the centers should be normalized after each maximization step.

Indeed, when the centers and data points are both normalized, there is a 1-to-1 relationship between the cosine distance and Euclidean distance

|a - b|_2 = 2 * (1 - cos(a,b))

The package jasonlaska/spherecluster modifies scikit-learns's k-means into spherical k-means and also provides another sphere clustering algorithm.




回答2:


It looks like the salient feature in the spherical k-means is the use of the cosine distance, instead of the standard Euclidean metric. With that being said, there is a nice pure numpy/scipy adaptation here on SO in another answer:

Is it possible to specify your own distance function using Scikits.Learn K-Means Clustering?

If that doesn't meet what you are looking for you might want to try sklearn.cluster.




回答3:


Here's how you do it if you have polar coordinates on a 3D sphere, such as (lat, lon) pairs:

  1. If your coordinates are (lat, lon) coordinates measured in degrees you can write a function that converts these points into cartesian coordinates, like:

    def cartesian_encoder(coord, r_E=6371):
        """Convert lat/lon to cartesian points on Earth's surface.
    
        Input
        -----
            coord : numpy 2darray (size=(N, 2))
            r_E : radius of Earth
    
        Output
        ------
            out : numpy 2darray (size=(N, 3))
        """
        def _to_rad(deg):
            return deg * np.pi / 180.
    
        theta = _to_rad(coord[:, 0])  # lat [radians]
        phi = _to_rad(coord[:, 1])    # lon [radians]
    
        x = r_E * np.cos(phi) * np.cos(theta)
        y = r_E * np.sin(phi) * np.cos(theta)
        z = r_E * np.sin(theta)
    
        return np.concatenate([x.reshape(-1, 1), y.reshape(-1, 1), z.reshape(-1, 1)], axis=1)
    

    If your coordinates are already in radians, just remove the first 5 lines in that function.

  2. Install the spherecluster package with pip. If your polar data given as rows of (lat, lon) pairs is called X and you want to find 10 cluster in it, the final code for KMeans-clustering spherically will be:

    import numpy as np
    import spherecluster
    
    X_cart = cartesian_encoder(X)
    kmeans_labels = SphericalKMeans(10).fit_predict(X_cart)
    


来源:https://stackoverflow.com/questions/19226925/spherical-k-means-implementation-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!