问题
This is a typical use case for FEM/FVM equation systems, so is perhaps of broader interest. From a triangular mesh à la
I would like to create a scipy.sparse.csr_matrix
. The matrix rows/columns represent values at the nodes of the mesh. The matrix has entries on the main diagonal and wherever two nodes are connected by an edge.
Here's an MWE that first builds a node->edge->cells relationship and then builds the matrix:
import numpy
import meshzoo
from scipy import sparse
nx = 1600
ny = 1000
verts, cells = meshzoo.rectangle(0.0, 1.61, 0.0, 1.0, nx, ny)
n = len(verts)
nds = cells.T
nodes_edge_cells = numpy.stack([nds[[1, 2]], nds[[2, 0]],nds[[0, 1]]], axis=1)
# assign values to each edge (per cell)
alpha = numpy.random.rand(3, len(cells))
vals = numpy.array([
[alpha**2, -alpha],
[-alpha, alpha**2],
])
# Build I, J, V entries for COO matrix
I = []
J = []
V = []
#
V.append(vals[0][0])
V.append(vals[0][1])
V.append(vals[1][0])
V.append(vals[1][1])
#
I.append(nodes_edge_cells[0])
I.append(nodes_edge_cells[0])
I.append(nodes_edge_cells[1])
I.append(nodes_edge_cells[1])
#
J.append(nodes_edge_cells[0])
J.append(nodes_edge_cells[1])
J.append(nodes_edge_cells[0])
J.append(nodes_edge_cells[1])
# Create suitable data for coo_matrix
I = numpy.concatenate(I).flat
J = numpy.concatenate(J).flat
V = numpy.concatenate(V).flat
matrix = sparse.coo_matrix((V, (I, J)), shape=(n, n))
matrix = matrix.tocsr()
With
python -m cProfile -o profile.prof main.py
snakeviz profile.prof
one can create and view a profile of the above:
The method tocsr()
takes the lion share of the runtime here, but this is also true when building alpha
is more complex. Consequently, I'm looking for ways to speed this up.
What I've already found:
Due to the structure of the data, the values on the diagonal of the matrix can be summed up in advance, i.e.,
V.append(vals[0, 0, 0] + vals[1, 1, 2]) I.append(nodes_edge_cells[0, 0]) # == nodes_edge_cells[1, 2] J.append(nodes_edge_cells[0, 0]) # == nodes_edge_cells[1, 2]
This makes
I
,J
,V
shorter and thus speeds uptocsr
.Right now, edges are "per cell". I could identify equal edges with each other using
numpy.unique
, effectively saving about half ofI
,J
,V
. However, I found that this too takes some time. (Not surprising.)
One other thought that I had was that that I could replace the diagonal V
, I
, J
by a simple numpy.add.at
if there was a csr_matrix
-like data structure where the main diagonal is kept separately. I know that this exists in some other software packages, but couldn't find it in scipy. Correct?
Perhaps there's a sensible way to construct CSR directly?
回答1:
I would try creating the csr structure directly, especially if you are resorting to np.unique
since this gives you sorted keys, which is half the job done.
I'm assuming you are at the point where you have i, j
sorted lexicographically and overlapping v
summed using np.add.at
on the optional inverse
output of np.unique
.
Then v
and j
are already in csr format. All that's left to do is creating the indptr
which you simply get by np.searchsorted(i, np.arange(M+1))
where M
is the column length. You can pass these directly to the sparse.csr_matrix
constructor.
Ok, let code speak:
import numpy as np
from scipy import sparse
from timeit import timeit
def tocsr(I, J, E, N):
n = len(I)
K = np.empty((n,), dtype=np.int64)
K.view(np.int32).reshape(n, 2).T[...] = J, I
S = np.argsort(K)
KS = K[S]
steps = np.flatnonzero(np.r_[1, np.diff(KS)])
ED = np.add.reduceat(E[S], steps)
JD, ID = KS[steps].view(np.int32).reshape(-1, 2).T
ID = np.searchsorted(ID, np.arange(N+1))
return sparse.csr_matrix((ED, np.array(JD, dtype=int), ID), (N, N))
def viacoo(I, J, E, N):
return sparse.coo_matrix((E, (I, J)), (N, N)).tocsr()
#testing and timing
# correctness
N = 1000
A = np.random.random((N, N)) < 0.001
I, J = np.where(A)
E = np.random.random((2, len(I)))
D = np.zeros((2,) + A.shape)
D[:, I, J] = E
D2 = tocsr(np.r_[I, I], np.r_[J, J], E.ravel(), N).A
print('correct:', np.allclose(D.sum(axis=0), D2))
# speed
N = 100000
K = 10
I, J = np.random.randint(0, N, (2, K*N))
E = np.random.random((2 * len(I),))
I, J, E = np.r_[I, I, J, J], np.r_[J, J, I, I], np.r_[E, E]
print('N:', N, ' -- nnz (with duplicates):', len(E))
print('direct: ', timeit('f(a,b,c,d)', number=10, globals={'f': tocsr, 'a': I, 'b': J, 'c': E, 'd': N}), 'secs for 10 iterations')
print('via coo:', timeit('f(a,b,c,d)', number=10, globals={'f': viacoo, 'a': I, 'b': J, 'c': E, 'd': N}), 'secs for 10 iterations')
Prints:
correct: True
N: 100000 -- nnz (with duplicates): 4000000
direct: 7.702431229001377 secs for 10 iterations
via coo: 41.813509466010146 secs for 10 iterations
Speedup: 5x
回答2:
So, in the end this turned out to be the difference between COO's and CSR's sum_duplicates
(just like @hpaulj suspected). Thanks to the efforts of everyone involved here (particularly @paul-panzer), a PR is underway to give tocsr
a tremendous speedup.
SciPy's tocsr
does a lexsort
on (I, J)
, so it helps organizing the indices in such a way that (I, J)
will come out fairly sorted already.
For for nx=4
, ny=2
in the above example, I
and J
are
[1 6 3 5 2 7 5 5 7 4 5 6 0 2 2 0 1 2 1 6 3 5 2 7 5 5 7 4 5 6 0 2 2 0 1 2 5 5 7 4 5 6 0 2 2 0 1 2 1 6 3 5 2 7 5 5 7 4 5 6 0 2 2 0 1 2 1 6 3 5 2 7]
[1 6 3 5 2 7 5 5 7 4 5 6 0 2 2 0 1 2 5 5 7 4 5 6 0 2 2 0 1 2 1 6 3 5 2 7 1 6 3 5 2 7 5 5 7 4 5 6 0 2 2 0 1 2 5 5 7 4 5 6 0 2 2 0 1 2 1 6 3 5 2 7]
First sorting each row of cells
, then the rows by the first column like
cells = numpy.sort(cells, axis=1)
cells = cells[cells[:, 0].argsort()]
produces
[1 4 2 5 3 6 5 5 5 6 7 7 0 0 1 2 2 2 1 4 2 5 3 6 5 5 5 6 7 7 0 0 1 2 2 2 5 5 5 6 7 7 0 0 1 2 2 2 1 4 2 5 3 6 5 5 5 6 7 7 0 0 1 2 2 2 1 4 2 5 3 6]
[1 4 2 5 3 6 5 5 5 6 7 7 0 0 1 2 2 2 5 5 5 6 7 7 0 0 1 2 2 2 1 4 2 5 3 6 1 4 2 5 3 6 5 5 5 6 7 7 0 0 1 2 2 2 5 5 5 6 7 7 0 0 1 2 2 2 1 4 2 5 3 6]
For the number in the original post, sorting cuts down the runtime from about 40 seconds to 8 seconds.
Perhaps an even better ordering can be achieved if the nodes are numbered more appropriately in the first place. I'm thinking of Cuthill-McKee and friends.
来源:https://stackoverflow.com/questions/42349191/efficiently-construct-fem-fvm-matrix