问题 I have a piece of code with a bottleneck calculation involving tensor contractions. Lets say I want to calculate a tensor A_{i,j,k,l}( X ) whose non-zero entries for a single x\in X are N ~ 10^5, and X represents a grid with M total points, with M~1000 approximately. For a single element of the tensor A, the rhs of the equation looks something like: A_{ijkl}(M) = Sum_{m,n,p,q} S_{i,j, m,n }(M) B_{m,n,p,q}(M) T_{ p,q, k,l }(M) In addition, the middle tensor B_{m,n,p,q}(M) is obtained by