Clustering Algorithm with discrete and continuous attributes?

三世轮回 提交于 2019-12-03 06:41:30

If I remember correctly, then COBWEB algorithm could work with discrete attributes.

And you can also do different 'tricks' to the discrete attributes in order to create meaningful distance metrics.

You could google for clustering of categorical/discrete attributes, one of the first hits: ROCK: A Robust Clustering Algorithm for Categorical Attributes.

R is a great tool for clustering - the standard approach would be to calculate a dissimilarity matrix on your mixed data using daisy, then clustering with that matrix using agnes.

The cba module on CRAN includes a function to cluster on binary predictors based on ROCK.

You could also look at affinity propagation as a possible solution. But to overcome the continuous / discrete dilemma you need to define a function that values the discrete states.

I would actually present pairs of the discrete attributes to users and ask them to define their proximity. You would present them with a scale reaching from [synonym..very foreign] or similar. Having many people do this you will end up with a widely accepted proximity function for the non-linear attribute values.

How about transforming each of your categorical attributes into a series of N-1 binary indicator attributes (where N is the number of categories)? You shouldn't be afraid of high dimensionality, as a sparse representation (such as mahout's SequentialAccessSparseVector can be employed). Once you do that, you can use a classical K-means or whatever standard numeric-only clustering algorithm.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!