pdist for theano tensor

前端 未结 2 445
花落未央
花落未央 2020-12-17 02:08

I have a theano symbolic matrix

x = T.fmatrix(\'input\')

x will be later on populated by n vectors of dim d

2条回答
  •  夕颜
    夕颜 (楼主)
    2020-12-17 02:30

    I haven't worked with Theano before, but here is a solution based on pure Numpy functions (perhaps you convert it to the equivalent theano functions. Note that I'm using automatic broadcasting in the expression below, so you might have to rewrite that explicitly if Theano doesn't support it):

    # X is an m-by-n matrix (rows are examples, columns are dimensions)
    # D is an m-by-m symmetric matrix of pairwise Euclidean distances
    a = np.sum(X**2, axis=1)
    D = np.sqrt((a + a[np.newaxis].T) - 2*np.dot(X, X.T))
    

    It is based on the fact that: ||u-v||^2 = ||u||^2 + ||v||^2 - 2*u.v. (I showed this in previous answers of mine using MATLAB)

    Here is a comparison against Scipy existing functions:

    import numpy as np
    from scipy.spatial.distance import pdist, squareform
    
    def my_pdist(X):
        a = np.sum(X**2, axis=1)
        D = np.sqrt((a + a[np.newaxis].T) - 2*np.dot(X, X.T))
        return D
    
    def scipy_pdist(X):
        D = squareform(pdist(X, metric='euclidean'))
        return D    
    
    X = np.random.rand(5, 3)
    D1 = my_pdist(X)
    D2 = scipy_pdist(X)
    

    The difference should be negligible, close to machine epsilon (np.spacing(1)):

    >>> np.linalg.norm(D1-D2)
    8.5368137554718277e-16
    

    HTH


    EDIT:

    Here is another implementation with a single loop:

    def my_pdist_compact(X):
        D = np.empty(shape=[0,0], dtype=X.dtype)
        for i in range(X.shape[0]-1):
            D = np.append(D, np.sqrt(np.sum((X[i,] - X[i+1:,])**2, axis=1)))
        return D
    

    Somewhat equivalent MATLAB code:

    function D = my_pdist_compact(X)
        n = size(X,1);
        D = cell(n-1,1);
        for i=1:n-1
            D{i} = sqrt(sum(bsxfun(@minus, X(i,:), X(i+1:end,:)).^2, 2));
        end
        D = vertcat(D{:});
    end
    

    This returns the pairwise-distances in compact form (upper triangular part of the symmetric matrix). This is the same output as pdist. Use squareform to convert it to full matrix.

    >>> d1 = my_pdist_compact(X)
    >>> d2 = pdist(X)    # from scipy.spatial.distance
    >>> (d1 == d2).all()
    True
    

    I will leave it to you to see if it's possible to write the equivalent loop using Theano (see theano.scan)!

提交回复
热议问题