Theano sqrt returning NaN values

只愿长相守 提交于 2021-01-27 14:40:17

问题


In my code I'm using theano to calculate an euclidean distance matrix (code from here):

import theano
import theano.tensor as T
MAT = T.fmatrix('MAT')
squared_euclidean_distances = (MAT ** 2).sum(1).reshape((MAT.shape[0], 1)) + (MAT ** 2).sum(1).reshape((1, MAT.shape[0])) - 2 * MAT.dot(MAT.T)
f_euclidean = theano.function([MAT], T.sqrt(squared_euclidean_distances))
def pdist_euclidean(mat):
    return f_euclidean(mat)

But the following code causes some values of the matrix to be NaN. I've read that this happens when calculating theano.tensor.sqrt() and here it's suggested to

Add an eps inside the sqrt (or max(x,EPs))

So I've added an eps to my code:

import theano
import theano.tensor as T

eps = 1e-9

MAT = T.fmatrix('MAT')

squared_euclidean_distances = (MAT ** 2).sum(1).reshape((MAT.shape[0], 1)) + (MAT ** 2).sum(1).reshape((1, MAT.shape[0])) - 2 * MAT.dot(MAT.T)

f_euclidean = theano.function([MAT], T.sqrt(eps+squared_euclidean_distances))

def pdist_euclidean(mat):
    return f_euclidean(mat)

And I'm adding it before performing sqrt. I'm getting less NaNs, but I'm still getting them. What is the proper solution to the problem? I've also noticed that if MAT is T.dmatrix() there are no NaN


回答1:


There are two likely sources of NaNs when computing Euclidean distances.

  1. Floating point representation approximation issues causing negative distances when it's really just zero. The square root of a negative number is undefined (assuming you're not interested in the complex solution).

    Imagine MAT has the value

    [[ 1.62434536 -0.61175641 -0.52817175 -1.07296862  0.86540763]
     [-2.3015387   1.74481176 -0.7612069   0.3190391  -0.24937038]
     [ 1.46210794 -2.06014071 -0.3224172  -0.38405435  1.13376944]
     [-1.09989127 -0.17242821 -0.87785842  0.04221375  0.58281521]]
    

    Now, if we break down the computation we see that (MAT ** 2).sum(1).reshape((MAT.shape[0], 1)) + (MAT ** 2).sum(1).reshape((1, MAT.shape[0])) has value

    [[ 10.3838024   -9.92394296  10.39763039  -1.51676099]
     [ -9.92394296  18.16971188 -14.23897281   5.53390084]
     [ 10.39763039 -14.23897281  15.83764622  -0.65066204]
     [ -1.51676099   5.53390084  -0.65066204   4.70316652]]
    

    and 2 * MAT.dot(MAT.T) has value

    [[ 10.3838024   14.27675714  13.11072431   7.54348446]
     [ 14.27675714  18.16971188  17.00367905  11.4364392 ]
     [ 13.11072431  17.00367905  15.83764622  10.27040637]
     [  7.54348446  11.4364392   10.27040637   4.70316652]]
    

    The diagonal of these two values should be equal (the distance between a vector and itself is zero) and from this textual representation it looks like that is true, but in fact they are slightly different -- the differences are too small to show up when we print the floating point values like this

    This becomes apparent when we print the value of the full expression (the second of the matrices above subtracted from the first)

    [[  0.00000000e+00   2.42007001e+01   2.71309392e+00   9.06024545e+00]
     [  2.42007001e+01  -7.10542736e-15   3.12426519e+01   5.90253836e+00]
     [  2.71309392e+00   3.12426519e+01   0.00000000e+00   1.09210684e+01]
     [  9.06024545e+00   5.90253836e+00   1.09210684e+01   0.00000000e+00]]
    

    The diagonal is almost composed of zeros but the item in the second row, second column is now a very small negative value. When you then compute the square root of all these values you get NaN in that position because the square root of a negative number is undefined (for real numbers).

    [[ 0.          4.91942071  1.64714721  3.01002416]
     [ 4.91942071         nan  5.58951267  2.42951402]
     [ 1.64714721  5.58951267  0.          3.30470398]
     [ 3.01002416  2.42951402  3.30470398  0.        ]]
    
  2. Computing the gradient of a Euclidean distance expression with respect to a variable inside the input to the function. This can happen not only if a negative number of generated due to floating point approximations, as above, but also if any of the inputs are zero length.

    If y = sqrt(x) then dy/dx = 1/(2 * sqrt(x)). So if x=0 or, for your purposes, if squared_euclidean_distances=0 then the gradient will be NaN because 2 * sqrt(0) = 0 and dividing by zero is undefined.

The solution to the first problem can be achieved by ensuring squared distances are never negative by forcing them to be no less than zero:

T.sqrt(T.maximum(squared_euclidean_distances, 0.))

To solve both problems (if you need gradients) then you need to make sure the squared distances are never negative or zero, so bound with a small positive epsilon:

T.sqrt(T.maximum(squared_euclidean_distances, eps))

The first solution makes sense since the problem only arises from approximate representations. The second is a bit more questionable because the true distance is zero so, in a sense, the gradient should be undefined. Your specific use case may yield some alternative solution that is maintains the semantics without an artificial bound (e.g. by ensuring that gradients are never computed/used for zero-length vectors). But NaN values can be pernicious: they can spread like weeds.




回答2:


Just checking

In squared_euclidian_distances you're adding a column, a row, and a matrix. Are you sure this is what you want?

More precisely, if MAT is of shape (n, p), you're adding matrices of shapes (n, 1), (1, n) and (n, n).

Theano seems to silently repeat the rows (resp. the columns) of each one-dimensional member to match the number of rows and columns of the dot product.

If this is what you want

In reshape, you should probably specify ndim=2 according to basic tensor functionality : reshape.

If the shape is a Variable argument, then you might need to use the optional ndim parameter to declare how many elements the shape has, and therefore how many dimensions the reshaped Variable will have.

Also, it seems that squared_euclidean_distances should always be positive, unless imprecision errors in the difference change zero values into small negative values. If this is true, and if negative values are responsible for the NaNs you're seeing, you could indeed get rid of them without corrupting your result by surrounding squared_euclidean_distances with abs(...).



来源:https://stackoverflow.com/questions/31919818/theano-sqrt-returning-nan-values

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!