How to segment new data with existing K-means model?

大憨熊 提交于 2019-12-22 12:34:07

问题


I have built a segmentation model using k-means clustering.

Could anybody describe the process for assigning new data into these segments?

Currently I am applying the same transformations/standardisations/outliers as I did to build the model and then calculating the euclidean distance. The minimum distance is the segment that record falls into.

But, I am seeing the majority fall into 1 particular segment and I am wondering if I have missed something along the way?

Thanks


回答1:


Classifying a new observation based on euclidean distance to the nearest mean may work for some scenarios, however it ignores the shape/size of the original cluster.

One way around this would be to use the original cluster data to help classify each new observation (e.g., using KNN http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm)

As an alternative, you might consider using an alternative clustering technique, such as Mixture of Gaussians:
http://en.wikipedia.org/wiki/Mixture_model
http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/mixture.html

Using this, you will not only get a mean for each cluster, but also a variance. For each new observation, you can then compute the probability that it belongs to each cluster. That probability will take the original cluster size/shape into account. It's also nicer to work with type type of "soft" approach because it tells you how strongly each new observation belongs to each cluster, and you can do things like tag observations as outliers that are greater than some number of standard deviations away from all clusters.



来源:https://stackoverflow.com/questions/18131173/how-to-segment-new-data-with-existing-k-means-model

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!