sparse-matrix

Scipy sparse memory explosion with simple matrix multiplication

家住魔仙堡 提交于 2021-01-28 06:24:18
问题 I noted that Scipy must be storing some intermediate arrays when doing matrix multiplication. I assume this can be helpful in some cases, but it is a pain sometimes. Consider the following example: from scipy.sparse import coo_matrix n = 100000000000 row = np.array([0, 0]) col = np.array([0, n-1]) data = np.array([1, 1]) A = coo_matrix((data, (row, col)), shape=(2, n)) Yes, this is a very large matrix. However it has only two nonzero values. The result of B = A.dot(A.T) can be evaluated by

How to split sparse matrix into train and test sets?

[亡魂溺海] 提交于 2021-01-28 04:09:22
问题 I want to understand how to work with sparse matrices. I have this code to generate multi-label classification data set as a sparse matrix. from sklearn.datasets import make_multilabel_classification X, y = make_multilabel_classification(sparse = True, n_labels = 20, return_indicator = 'sparse', allow_unlabeled = False) This code gives me X in the following format: <100x20 sparse matrix of type '<class 'numpy.float64'>' with 1797 stored elements in Compressed Sparse Row format> y: <100x5

difference between 2 scipy sparse csr matrices

白昼怎懂夜的黑 提交于 2021-01-27 20:17:32
问题 I have 2 scipy.sparse.csr_matrix like this: A = [ 1 0 1 0 0 1 1 0 0 1 0 0 0 1 0 0 0 0 ] B = [ 1 0 1 0 1 1 1 1 0 1 0 0 1 1 1 0 0 0 ] I am willing to get the "new ones" that appeared in B but weren't in A. C = [ 0 0 0 0 1 0 0 1 0 0 0 0 1 0 1 0 0 0 ] 回答1: IIUC it should be pretty straightforward: In [98]: C = B - A In [99]: C Out[99]: <3x6 sparse matrix of type '<class 'numpy.int32'>' with 4 stored elements in Compressed Sparse Row format> In [100]: C.A Out[100]: array([[0, 0, 0, 0, 1, 0], [0, 1

Initialize high dimensional sparse matrix

|▌冷眼眸甩不掉的悲伤 提交于 2021-01-27 13:22:59
问题 I want to initialize 300,000 x 300,0000 sparse matrix using sklearn , but it requires memory as if it was not sparse: >>> from scipy import sparse >>> sparse.rand(300000,300000,.1) it gives the error: MemoryError: Unable to allocate 671. GiB for an array with shape (300000, 300000) and data type float64 which is the same error as if I initialize using numpy : np.random.normal(size=[300000, 300000]) Even when I go to a very low density, it reproduces the error: >>> from scipy import sparse >>>

What is the fastest way to compute a sparse Gram matrix in Python?

人盡茶涼 提交于 2021-01-27 07:23:30
问题 A Gram matrix is a matrix of the structure X @ X.T which of course is symmetrical. When dealing with dense matrices, the numpy.dot product implementation is intelligent enough to recognize the self-multiplication to exploit the symmetry and thus speed up the computations (see this). However, no such effect can be observed when using scipy.sparse matrices: random.seed(0) X = random.randn(5,50) X[X < 1.5] = 0 X = scipy.sparse.csr_matrix(X) print(f'sparsity of X: {100 * (1 - X.count_nonzero() /

What is the fastest way to compute a sparse Gram matrix in Python?

a 夏天 提交于 2021-01-27 07:22:56
问题 A Gram matrix is a matrix of the structure X @ X.T which of course is symmetrical. When dealing with dense matrices, the numpy.dot product implementation is intelligent enough to recognize the self-multiplication to exploit the symmetry and thus speed up the computations (see this). However, no such effect can be observed when using scipy.sparse matrices: random.seed(0) X = random.randn(5,50) X[X < 1.5] = 0 X = scipy.sparse.csr_matrix(X) print(f'sparsity of X: {100 * (1 - X.count_nonzero() /

indices[201] = [0,8] is out of order. Many sparse ops require sorted indices.Use `tf.sparse.reorder` to create a correctly ordered copy

十年热恋 提交于 2021-01-27 05:06:56
问题 Im doing a neural network encoding every variable and when im going to fit the model, an error raises. indices[201] = [0,8] is out of order. Many sparse ops require sorted indices. Use `tf.sparse.reorder` to create a correctly ordered copy. [Op:SerializeManySparse] I dunno how to solve it. I can print some code here and if u want more i can still printing it def process_atributes(df, train, test): continuas = ['Trip_Duration'] cs = MinMaxScaler() trainCont = cs.fit_transform(train[continuas])

indices[201] = [0,8] is out of order. Many sparse ops require sorted indices.Use `tf.sparse.reorder` to create a correctly ordered copy

天大地大妈咪最大 提交于 2021-01-27 05:06:43
问题 Im doing a neural network encoding every variable and when im going to fit the model, an error raises. indices[201] = [0,8] is out of order. Many sparse ops require sorted indices. Use `tf.sparse.reorder` to create a correctly ordered copy. [Op:SerializeManySparse] I dunno how to solve it. I can print some code here and if u want more i can still printing it def process_atributes(df, train, test): continuas = ['Trip_Duration'] cs = MinMaxScaler() trainCont = cs.fit_transform(train[continuas])

How to use Numba to speed up sparse linear system solvers in Python that are provided in scipy.sparse.linalg?

别来无恙 提交于 2021-01-05 07:33:43
问题 I wish to speed up the sparse system solver part of my code using Numba. Here is what I have up till now: # Both numba and numba-scipy packages are installed. I am using PyCharm IDE import numba import numba_scipy # import other required stuff @numba.jit(nopython=True) def solve_using_numba(A, b): return sp.linalg.gmres(A, b) # total = the number of points in the system A = sp.lil_matrix((total, total), dtype=float) # populate A with appropriate data A = A.tocsc() b = np.zeros((total, 1),