Finding the best cosine similarity in a set of vectors

前端 未结 2 1926
不思量自难忘°
不思量自难忘° 2020-12-15 00:03

I have n vectors, each with m elements (real number). I want to find the pair where there cosine similarity is maximum among all pairs.

The straightforward solution

相关标签:
2条回答
  • 2020-12-15 00:18

    You can check with the project simbase https://github.com/guokr/simbase , it is a vector similarity nosql database.

    Simbase use below concepts:

    • Vector set: a set of vectors
    • Basis: the basis for vectors, vectors in one vector set have same basis
    • Recommendation: a one-direction binary relationship between two vector sets which have the same basis

    You can use redis-cli directly for administration tasks, or you can use redis client bindings in different language directly in a programming way. Here is a Python example

        import redis
    
        dest = redis.Redis(host='localhost', port=7654)
        schema = ['a', 'b', 'c']
        dest.execute_command('bmk', 'ba', *schema)
        dest.execute_command('vmk', 'ba', 'va')
        dest.execute_command('rmk', 'va', 'va', 'cosinesq')
    
    0 讨论(0)
  • 2020-12-15 00:30

    Cosine similarity sim(a,b) is related to Euclidean distance |a - b| by

    |a - b|² = 2(1 - sim(a,b))
    

    for unit vectors a and b.

    That means cosine similarity is greatest when Euclidean distance is smallest after normalizing by the L2 norm, and the problem reduces to the closest pair of points problem, which can be solved in O(n lg n) time.

    0 讨论(0)
提交回复
热议问题