Algorithm for clustering people with similar interests

狂风中的少年 提交于 2019-12-04 12:10:35

This does not sound like a particularly difficult clustering problem, and any of the off-the-shelf clustering algorithm will probably work well. If you know how many clusters you want, then try k-means or k-medoid clustering. If you don't know how many clusters, then try agglomerative clustering.

The difficult part of the problem will be the features. You mentioned that 'interests' could be used as the features upon which to cluster, but feature engineering and selection will always involve some trial and error.

Without more context of your problem, I can't really give a definite answer. Most clustering algorithms will work though, the problem is how "good" are your results. I'm quoting the word "good" because you'll need some sort of metric to measure that (generally inter-cluster and intra-cluster distance).

Here's the advice given to me when I was taught on how to decide on an algorithm for data mining: Try the simplest algorithms first - quite often these are overlooked but perform quite well (Naive Bayes for supervised learning is a classic example).

To start you off, try something like K-means which is a simple and popular method, you can find more info here http://en.wikipedia.org/wiki/K-means_clustering (if you look at the Software section you can also find a list of implementations that you could try).

The second part of the criteria is to be able to output the other people in the group based on a target person. This is doable in all clustering algorithms since you'll have X subsets of people, you simply need to find the subset which the target person is in and then iterate that subset and print all the people within out.

I think the right approach will be Kmeans clustering. The most important part of your problem is feature selection.

Try with some features that you think are most important and simply apply kmeans in some statistical programing language like R, inspect the result and improve it by feature modification or selecting more appropriate features. Hit and trial can give you insight if you are not sure about feature selection.

If you can provide some sample data, it will help to give some specific solutions to your problem.

Its coming a bit late, but there's actually an app in the windows store that is doing exactly that : finding profiles having similar characteristics its called k-modo

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!