Deciding input values to DBSCAN algorithm

问题

I have written code in python to implement DBSCAN clustering algorithm. My dataset consists of 14k users with each user represented by 10 features. I am unable to decide what exactly to keep as the value of Min_samples and epsilon as input How should I decide that? Similarity measure is euclidean distance.(Hence it becomes even more tough to decide.) Any pointers?

回答1:

DBSCAN is pretty often hard to estimate its parameters.

Did you think about the OPTICS algorithm? You only need in this case Min_samples which would correspond to the minimal cluster size.

Otherwise for DBSCAN I've done it in the past by trial and error : try some values and see what happens. A general rule to follow is that if your dataset is noisy, you should have a larger value, and it is also correlated with the number of dimensions (10 in this case).

来源：https://stackoverflow.com/questions/10155542/deciding-input-values-to-dbscan-algorithm

标签

python

cluster-analysis

dbscan

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!