Unsupervised high dimension clustering

对着背影说爱祢 提交于 2020-01-05 07:16:34

问题


I have dataset of records where each record is with 5 labels and the importance of each label is different.

I know to labels order according to importance but don't know the differences, so the difference between two records is look like: adist of label1 + bdist of label2 + c*dist of label3 such that a+b+c = 1.

The data set contains around 3000 records and I want to cluster it(don't know the number of clusters) in some way.

I thought about DBSCAN but it is not really good with high dimensional data.

Hierarchical clustering need to know the number of clusters and also I think that it depands what it the first record you compare to so maybe the result will be wrong in this case.

Also look for graph clustering so the difference between two records will be the weight of the edge between this tow nodes but didn't find an algorithm that does that.

EDIT:

the data is a CDR data, represent the antennas user connected to while using his cellphone for calling, SMS and internet so the labels are:

location(longitude,latitude), part_of_day(night,morning-noon,after noon,evening), 
workday\weekend, ,day_of_week, num of days of connection to this antenna

And I want to cluster it to detect points of interest of this user such as gym, mall, etc.. so I want to cluster it and separate between gym and mall even though they are close to each other but it is a different activity.

Any ideas about how to do it?

来源:https://stackoverflow.com/questions/59248764/unsupervised-high-dimension-clustering

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!