Cosine distance as vector distance function for k-means

前端未结

关注

 3  2191

暖寄归人 2021-02-07 19:29

I have a graph of N vertices where each vertex represents a place. Also I have vectors, one per user, each one of N coefficients where the coefficient\'s value is the duration i

3条回答

天命终不由人 (楼主)

2021-02-07 20:22

Cosine similarity is meant for the case where you do not want to take length into accoun, but the angle only. If you want to also include length, choose a different distance function.

Cosine distance is closely related to squared Euclidean distance (the only distance for which k-means is really defined); which is why spherical k-means works.

The relationship is quite simple:

squared Euclidean distance sum_i (x_i-y_i)^2 can be factored into sum_i x_i^2 + sum_i y_i^2 - 2 * sum_i x_i*y_i. If both vectors are normalized, i.e. length does not matter, then the first two terms are 1. In this case, squared Euclidean distance is 2 - 2 * cos(x,y)!

In other words: Cosine distance is squared Euclidean distance with the data normalized to unit length.

If you don't want to normalize your data, don't use Cosine.

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...