Cosine similarity between two equally-sized vectors (of reals) is defined as the dot product divided by the product of the norms.
To represent vectors, I have a larg
I gather that no extension that does this, so I've found a limited workaround:
If A and B are both normalized (length 1), cos(A, B) = 1 - 0.5 * ||A - B||^2. ||A - B|| is the Euclidean distance, and cos(A, B) is the cosine similarity. So greater Euclidean distance <=> lesser cosine similarity (makes sense intuitively if you imagine a unit circle), and if you have non-normal vectors, changing their magnitudes without changing their directions doesn't affect their cosine similarities. Great, so I can normalize my vectors and compare their Euclidean distances...
There's a nice answer here about Cube, which supports n-dimensional points and GiST indexes on Euclidean distance, but it only supports 100 or fewer dimensions (can be hacked higher, but I had issues around 135 and higher, so now I'm afraid). Also requires Postgres 9.6 or later.
So:
cube points. Create a GiST index on this column.EXPLAIN SELECT * FROM mytable ORDER BY normalized <-> cube(array[1,2,3,4,5,6,7,8,9,0]) LIMIT 10;If I need more than 100 dimensions, I might be able to achieve this using multiple indexed columns. Will update the answer in that case.
Update: Pretty sure there's nothing I can do with splitting the >100-dimension vector into multiple columns. I end up having to scan the entire table.