sparse-matrix

creating a scipy.lil_matrix using a python generator efficiently

大憨熊 提交于 2019-12-06 14:50:48
I have a generator that generates single dimension numpy.array s of the same length. I would like to have a sparse matrix containing that data. Rows are generated in the same order I'd like to have them in the final matrix. csr matrix is preferable over lil matrix, but I assume the latter will be easier to build in the scenario I'm describing. Assuming row_gen is a generator yielding numpy.array rows, the following code works as expected. def row_gen(): yield numpy.array([1, 2, 3]) yield numpy.array([1, 0, 1]) yield numpy.array([1, 0, 0]) matrix = scipy.sparse.lil_matrix(list(row_gen()))

Matlab allocates a sparse matrix more memory than is required

痞子三分冷 提交于 2019-12-06 14:00:48
问题 Suppose I create this sparse matrix, where the non-zero elements consist of booleans 'true': s = sparse([3 2 3 3 3 3 2 34 3 6 3 2 3 3 3 3 2 3 3 6], [10235 11470 21211 33322 49297 88361 91470 127422 152383 158751 166485 171471 181211 193321 205548 244609 251470 283673 312384 318752], true); which contains 20 elements. Matlab ought to allocates no more than (4+4+1)*20 = 180 bytes of memory (it looks like the indices are 4 bytes long). Yet whos s says that the matrix takes up 1275112 bytes in

SQL Server: how to populate sparse data with the rest of zero values?

馋奶兔 提交于 2019-12-06 12:07:24
I have data reporting sales by every month and by every customer. When I count the values, the zero-values are not reported because of the sparsa data format. Suppose customer 1-4. Suppose only customers 1-2 have recordings. Straight table has customerIDs on rows and months on the columns such that |CustomerID|MonthID|Value| -------------------------| | 1 |201101 | 10 | | 2 |201101 | 100 | and then they are reported in Crosstab format such that |CustomerID|201101|201102|2011103|...|201501| --------------------------------------------- | 1 | 10 | 0 | 0 |...| 0 | | 2 | 100 | 0 | 0 |...| 0 | | 3

Incorrect eigenvalues SciPy sparse linalg.eigs, eigsh for non-diagonal M matrix

北战南征 提交于 2019-12-06 07:18:06
问题 Why do eigh and eigsh from scipy.sparse.linalg as used below give incorrect results when solving the generalized eigenvalue problem A * x = lambda * M * x , if M is non-diagonal? import mkl import numpy as np from scipy import linalg as LA from scipy.sparse import linalg as LAsp from scipy.sparse import csr_matrix A = np.diag(np.arange(1.0,7.0)) M = np.array([[ 25.1, 0. , 0. , 17.3, 0. , 0. ], [ 0. , 33.6, 16.8, 8.4, 4.2, 2.1], [ 0. , 16.8, 3.6, 0. , 11. , 0. ], [ 17.3, 8.4, 0. , 4.2, 0. , 9

Numpy re-index to first N natural numbers

懵懂的女人 提交于 2019-12-06 07:10:03
问题 I have a matrix that has a quite sparse index (the largest values in both rows and columns are beyond 130000), but only a few of those rows/columns actually have non-zero values. Thus, I want to have the row and column indices shifted to only represent the non-zero ones, by the first N natural numbers. Visually, I want a example matrix like this 1 0 1 0 0 0 0 0 1 to look like this 1 1 0 1 but only if all values in the row/column are zero. Since I do have the matrix in a sparse format, I could

python (scipy): Resizing a sparse matrix

荒凉一梦 提交于 2019-12-06 06:31:05
I'm having trouble resizing a matrix - the set_shape function seems to have no effect: >>> M <14x3562 sparse matrix of type '<type 'numpy.float32'>' with 6136 stored elements in LInked List format> >>> new_shape = (15,3562) >>> M.set_shape(new_shape) >>> M <14x3562 sparse matrix of type '<type 'numpy.float32'>' with 6136 stored elements in LInked List format> Anyone else come across this? I also tried doing this by hand, i.e. >>> M._shape = new_shape >>> M.data = np.concatenate(M.data, np.empty((0,0), dtype=np.float32)) but that throws up an error: *** TypeError: only length-1 arrays can be

Find all-zero columns in pandas sparse matrix

妖精的绣舞 提交于 2019-12-06 06:00:44
For example I have a coo_matrix A : import scipy.sparse as sp A = sp.coo_matrix([3,0,3,0], [0,0,2,0], [2,5,1,0], [0,0,0,0]) How can I get result [0,0,0,1], which indicates that first 3 columns contain non-zero values, only the 4th column is all zeros. PS : cannot convert A to other type. PS2 : I tried using np.nonzeros but it seems that my implementation is not very elegant. Approach #1 We could do something like this - # Get the columns indices of the input sparse matrix C = sp.find(A)[1] # Use np.in1d to create a mask of non-zero columns. # So, we invert it and convert to int dtype for

R constructing sparse Matrix

半腔热情 提交于 2019-12-06 05:23:20
问题 I'm reading through instructions of Matrix package in R. But I couldn't understand the p argument in function: sparseMatrix(i = ep, j = ep, p, x, dims, dimnames, symmetric = FALSE, index1 = TRUE, giveCsparse = TRUE, check = TRUE) According to http://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/sparseMatrix.html p: numeric (integer valued) vector of pointers, one for each column (or row), to the initial (zero-based) index of elements in the column (or row). Exactly one of i, j or p must

How to wrap Eigen::SparseMatrix over preexistant 3-standard compress row/colum arrays

一世执手 提交于 2019-12-06 05:06:17
NOTE: I allready asked this question, but it was closed because of "too broad" without much explanation. I can't see how this question could be more specific (it deals with a specific class of a specific library for a specific usage...), so I assume that it was something like a "moderator's mistake" and ask it again... I would like to perfom sparse matrix/matrix multiplication using Eigen on sparse matrices. These matrices are already defined in the code I am working on in standard 3-arrays compressed row/column strorage. Then I would like to use the Eigen::SparseMatrix class as a wrapper on

Parallel Cosine similarity of two large files with each other

耗尽温柔 提交于 2019-12-06 04:29:39
I have two files: A and B A has 400,000 lines each having 50 float values B has 40,000 lines having 50 float values. For every line in B, I need to find corresponding lines in A which have >90% similarity (cosine). For linear search and computation, the code takes ginormous computing time. (40-50 hours) Reaching out to the community for suggestions on how to fasten the process (link of blogs/resources such as AWS/Cloud to be used to achieve it). Have been stuck with this for quite a while! [There were mentions of rpud/rpudplus to do it, but can't seem to perform them on cloud resources] N.B.