Return the furthermost outlier in kmeans clustering? [closed]

好久不见. 提交于 2019-12-13 09:10:25

问题


Is there any easy way to return the furthermost outlier after sklearn kmeans clustering?

Essentially I want to make a list of the biggest outliers for a load of clusters. Unfortunately I need to use sklearn.cluster.KMeans due to the assignment.


回答1:


Sascha basically gives it away in the comments, but if X denotes your data, and model the instance of KMeans, you can sort the values of X by the distance to their centers through

X[np.argsort(np.linalg.norm(X - model.cluster_centers_[model.labels_], axis=1))]

Alternatively, since you know that each point is assigned to the cluster whose center minimizes Euclidean distance to the point, you can fit and sort in one step through

X[np.argsort(np.min(KMeans(n_clusters=2).fit_transform(X), axis=1))]



回答2:


K-means is not well suited for "outlier" detection.

k-means has a tendency to make outliers a one-element cluster. Then the outliers have the smallest possible distance and will not be detected.

K-means is not robust enough when there are outliers in your data. You may actually want to remove outliers prior to using k-means.

Use rather something like kNN, LOF or LoOP instead.



来源:https://stackoverflow.com/questions/47489705/return-the-furthermost-outlier-in-kmeans-clustering

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!