R - cluster analysis on binary weblog data

↘锁芯ラ 提交于 2019-12-07 22:49:39

问题


I have a web data that looks similar to the sample below. It simply has the user and binary value for whether that user cliked on a particular link within a website. I wanted to do some clustering of this data. My main goal is to find similar users based on their online behaviour. What is a good clustering alorithm for this? I have tried k-means which does not work well with binary data. I have also tried spherical k-means skmeans(). I wanted to do a sum of squared error scree plot, but I could not figure out how to get SSE from skmeans.

   User   link1 link2 link3 link4
    abc1     0     1     1     1
    abc2     1     0     1     0
    abc3     0     1     1     1
    abc4     1     0     1     0

回答1:


You could try a hierarchical clustering using a binary distance measure like jaccard, if "clicked a link" is asymmetrical:

dat <- read.table(header = TRUE, row.names = 1, text = "User   link1 link2 link3 link4
abc1     0     1     1     1
abc2     1     0     1     0
abc3     0     1     1     1
abc4     1     0     1     0")
d <- dist(dat, method = "binary")
hc <- hclust(d)
plot(hc)

(clusters <- cutree(hc, k = 2))
# abc1 abc2 abc3 abc4 
#    1    2    1    2


来源:https://stackoverflow.com/questions/30252127/r-cluster-analysis-on-binary-weblog-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!