How to compute the cosine_similarity in pytorch for all rows in a matrix with respect to all rows in another matrix

六月ゝ 毕业季﹏ 提交于 2020-12-01 09:40:07

问题


In pytorch, given that I have 2 matrixes how would I compute cosine similarity of all rows in each with all rows in the other.

For example

Given the input =

matrix_1 = [a b] 
           [c d] 
matrix_2 = [e f] 
           [g h]

I would like the output to be

output =

 [cosine_sim([a b] [e f])  cosine_sim([a b] [g h])]
 [cosine_sim([c d] [e f])  cosine_sim([c d] [g h])] 

At the moment I am using torch.nn.functional.cosine_similarity(matrix_1, matrix_2) which returns the cosine of the row with only that corresponding row in the other matrix.

In my example I have only 2 rows, but I would like a solution which works for many rows. I would even like to handle the case where the number of rows in the each matrix is different.

I realize that I could use the expand, however I want to do it without using such a large memory footprint.


回答1:


By manually computing the similarity and playing with matrix multiplication + transposition:

import torch
from scipy import spatial
import numpy as np

a = torch.randn(2, 2)
b = torch.randn(3, 2) # different row number, for the fun

# Given that cos_sim(u, v) = dot(u, v) / (norm(u) * norm(v))
#                          = dot(u / norm(u), v / norm(v))
# We fist normalize the rows, before computing their dot products via transposition:
a_norm = a / a.norm(dim=1)[:, None]
b_norm = b / b.norm(dim=1)[:, None]
res = torch.mm(a_norm, b_norm.transpose(0,1))
print(res)
#  0.9978 -0.9986 -0.9985
# -0.8629  0.9172  0.9172

# -------
# Let's verify with numpy/scipy if our computations are correct:
a_n = a.numpy()
b_n = b.numpy()
res_n = np.zeros((2, 3))
for i in range(2):
    for j in range(3):
        # cos_sim(u, v) = 1 - cos_dist(u, v)
        res_n[i, j] = 1 - spatial.distance.cosine(a_n[i], b_n[j])
print(res_n)
# [[ 0.9978022  -0.99855876 -0.99854881]
#  [-0.86285472  0.91716063  0.9172349 ]]



回答2:


Adding eps for numerical stability base on benjaminplanche's answer:

def sim_matrix(a, b, eps=1e-8):
    """
    added eps for numerical stability
    """
    a_n, b_n = a.norm(dim=1)[:, None], b.norm(dim=1)[:, None]
    a_norm = a / torch.max(a_n, eps * torch.ones_like(a_n))
    b_norm = b / torch.max(b_n, eps * torch.ones_like(b_n))
    sim_mt = torch.mm(a_norm, b_norm.transpose(0, 1))
    return sim_mt


来源:https://stackoverflow.com/questions/50411191/how-to-compute-the-cosine-similarity-in-pytorch-for-all-rows-in-a-matrix-with-re

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!