What are noisy samples in Scikit's DBSCAN clustering algorithm?

你。 提交于 2020-01-24 13:05:05

问题


If I apply Scikit's DBSCAN (http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html) on a similarity matrix, I get a series of labels back. Some of these labels are -1. The documentation calls them noisy samples.

What are these? Do they all belong to a single cluster, or do they each belong to their own cluster since they're noisy?

Thank you


回答1:


These are not exactly part of a cluster. They are simply points that do not belong to any clusters and can be "ignored" to some extent.

Remember, DBSCAN stands for "Density-Based Spatial Clustering of Applications with Noise." DBSCAN checks to make sure a point has enough neighbors within a specified range to classify the points into the clusters.

But what happens to the points that do not meet the criteria for falling into any of the main clusters? What if a point does not have enough neighbors within the specified radius to be considered part of a cluster? These are the points that are given the cluster label of -1 and are considered noise.

So what?

Well, if you are analyzing data points and you are only interested in the general clusters, you lower the size of the data and cut out the noise. Or, if you are using cluster analysis to classify data, in some cases it is possible to discard the noise as outliers.

In anomaly detection, points that do not fit into any category are also significant, as they can represent a problem or rare event.



来源:https://stackoverflow.com/questions/45313176/what-are-noisy-samples-in-scikits-dbscan-clustering-algorithm

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!