Cosine distance as vector distance function for k-means

前端 未结 3 2191
暖寄归人
暖寄归人 2021-02-07 19:29

I have a graph of N vertices where each vertex represents a place. Also I have vectors, one per user, each one of N coefficients where the coefficient\'s value is the duration i

3条回答
  •  天命终不由人
    2021-02-07 20:22

    Cosine similarity is meant for the case where you do not want to take length into accoun, but the angle only. If you want to also include length, choose a different distance function.

    Cosine distance is closely related to squared Euclidean distance (the only distance for which k-means is really defined); which is why spherical k-means works.

    The relationship is quite simple:

    squared Euclidean distance sum_i (x_i-y_i)^2 can be factored into sum_i x_i^2 + sum_i y_i^2 - 2 * sum_i x_i*y_i. If both vectors are normalized, i.e. length does not matter, then the first two terms are 1. In this case, squared Euclidean distance is 2 - 2 * cos(x,y)!

    In other words: Cosine distance is squared Euclidean distance with the data normalized to unit length.

    If you don't want to normalize your data, don't use Cosine.

提交回复
热议问题