Clustering Algorithm with discrete and continuous attributes?

帅比萌擦擦* 提交于 2019-12-09 05:27:55

问题


Does anyone know a good algorithm for perform clustering on both discrete and continuous attributes? I am working on a problem of identifying a group of similar customers and each customer has both discrete and continuous attributes (Think type of customers, amount of revenue generated by this customer, geographic location and etc..)

Traditionally algorithm like K-means or EM work for continuous attributes, what if we have a mix of continuous and discrete attributes?


回答1:


If I remember correctly, then COBWEB algorithm could work with discrete attributes.

And you can also do different 'tricks' to the discrete attributes in order to create meaningful distance metrics.

You could google for clustering of categorical/discrete attributes, one of the first hits: ROCK: A Robust Clustering Algorithm for Categorical Attributes.




回答2:


R is a great tool for clustering - the standard approach would be to calculate a dissimilarity matrix on your mixed data using daisy, then clustering with that matrix using agnes.

The cba module on CRAN includes a function to cluster on binary predictors based on ROCK.




回答3:


You could also look at affinity propagation as a possible solution. But to overcome the continuous / discrete dilemma you need to define a function that values the discrete states.




回答4:


I would actually present pairs of the discrete attributes to users and ask them to define their proximity. You would present them with a scale reaching from [synonym..very foreign] or similar. Having many people do this you will end up with a widely accepted proximity function for the non-linear attribute values.




回答5:


How about transforming each of your categorical attributes into a series of N-1 binary indicator attributes (where N is the number of categories)? You shouldn't be afraid of high dimensionality, as a sparse representation (such as mahout's SequentialAccessSparseVector can be employed). Once you do that, you can use a classical K-means or whatever standard numeric-only clustering algorithm.



来源:https://stackoverflow.com/questions/829644/clustering-algorithm-with-discrete-and-continuous-attributes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!