I would like to compute the following using numpy or scipy:
Y = A**T * Q * A
where A is a m x n ma
A
m x n
numpy.einsum is what you're looking for:
numpy.einsum('ij, i, ik -> jk', A, Q, A)
This shall not need any additional memory (though usually einsum works slowlier than BLAS operations)