问题
I'm trying to fit a Theano model that is parametrized in part by a symmetric matrix A
. In order to enforce the symmetry of A
, I want to be able to construct A
by passing in just the values in the upper triangle.
The equivalent numpy code might look something like this:
import numpy as np
def make_symmetric(p, n):
A = np.empty((n, n), P.dtype)
A[np.triu_indices(n)] = p
A.T[np.triu_indices(n)] = p
# output matrix will be (n, n)
n = 4
# parameter vector
P = np.arange(n * (n + 1) / 2)
print make_symmetric(P, n)
# [[ 0. 1. 2. 3.]
# [ 1. 4. 5. 6.]
# [ 2. 5. 7. 8.]
# [ 3. 6. 8. 9.]]
However, since symbolic tensor variables don't support item assignment, I'm struggling to find a way to do this in Theano.
The closest thing I could find is theano.tensor.diag
, which allows me to construct a symbolic matrix from its diagonal:
import theano
from theano import tensor as te
P = te.dvector('P')
D = te.diag(P)
get_D = theano.function([P], D)
print get_D(np.arange(1, 5))
# [[ 1. 0. 0. 0.]
# [ 0. 2. 0. 0.]
# [ 0. 0. 3. 0.]
# [ 0. 0. 0. 4.]]
Whilst there is also a theano.tensor.triu
function, this cannot be used to construct a matrix from the upper triangle, but rather returns a copy of an array with the lower triangular elements zeroed.
Is there any way to construct a Theano symbolic matrix from its upper triangle?
回答1:
You could use the theano.tensor.triu
and add the result to its transpose, then subtract the diagonal.
Copy+Pasteable code:
import numpy as np
import theano
import theano.tensor as T
theano.config.floatX = 'float32'
mat = T.fmatrix()
sym1 = T.triu(mat) + T.triu(mat).T
diag = T.diag(T.diagonal(mat))
sym2 = sym1 - diag
f_sym1 = theano.function([mat], sym1)
f_sym2 = theano.function([mat], sym2)
m = np.arange(9).reshape(3, 3).astype(np.float32)
print m
# [[ 0. 1. 2.]
# [ 3. 4. 5.]
# [ 6. 7. 8.]]
print f_sym1(m)
# [[ 0. 1. 2.]
# [ 1. 8. 5.]
# [ 2. 5. 16.]]
print f_sym2(m)
# [[ 0. 1. 2.]
# [ 1. 4. 5.]
# [ 2. 5. 8.]]
Does this help? This approach would require a full matrix to be passed, but would ignore everything below the diagonal and symmetrize using the upper triangle.
We can also take a look at the derivative of this function. In order not to deal with a multidimensional output, we can e.g. look at the gradient of the sum of the matrix entries
sum_grad = T.grad(cost=sym2.sum(), wrt=mat)
f_sum_grad = theano.function([mat], sum_grad)
print f_sum_grad(m)
# [[ 1. 2. 2.]
# [ 0. 1. 2.]
# [ 0. 0. 1.]]
This reflects the fact that the upper triangular entries figure doubly in the sum.
Update: You can do normal indexing:
n = 4
num_triu_entries = n * (n + 1) / 2
triu_index_matrix = np.zeros([n, n], dtype=int)
triu_index_matrix[np.triu_indices(n)] = np.arange(num_triu_entries)
triu_index_matrix[np.triu_indices(n)[::-1]] = np.arange(num_triu_entries)
triu_vec = T.fvector()
triu_mat = triu_vec[triu_index_matrix]
f_triu_mat = theano.function([triu_vec], triu_mat)
print f_triu_mat(np.arange(1, num_triu_entries + 1).astype(np.float32))
# [[ 1. 2. 3. 4.]
# [ 2. 5. 6. 7.]
# [ 3. 6. 8. 9.]
# [ 4. 7. 9. 10.]]
Update: To do all of this dynamically, one way is to write a symbolic version of triu_index_matrix
. This can be done with some shuffling of arange
s. But probably I am overcomplicating.
n = T.iscalar()
n_triu_entries = (n * (n + 1)) / 2
r = T.arange(n)
tmp_mat = r[np.newaxis, :] + (n_triu_entries - n - (r * (r + 1)) / 2)[::-1, np.newaxis]
triu_index_matrix = T.triu(tmp_mat) + T.triu(tmp_mat).T - T.diag(T.diagonal(tmp_mat))
triu_vec = T.fvector()
sym_matrix = triu_vec[triu_index_matrix]
f_triu_index_matrix = theano.function([n], triu_index_matrix)
f_dynamic_sym_matrix = theano.function([triu_vec, n], sym_matrix)
print f_triu_index_matrix(5)
# [[ 0 1 2 3 4]
# [ 1 5 6 7 8]
# [ 2 6 9 10 11]
# [ 3 7 10 12 13]
# [ 4 8 11 13 14]]
print f_dynamic_sym_matrix(np.arange(1., 16.).astype(np.float32), 5)
# [[ 1. 2. 3. 4. 5.]
# [ 2. 6. 7. 8. 9.]
# [ 3. 7. 10. 11. 12.]
# [ 4. 8. 11. 13. 14.]
# [ 5. 9. 12. 14. 15.]]
来源:https://stackoverflow.com/questions/25326462/initializing-a-symmetric-theano-dmatrix-from-its-upper-triangle