sparse-matrix

Fastest way to sum over rows of sparse matrix

做~自己de王妃 提交于 2019-12-05 16:15:50
I have a big csr_matrix(1M*1K) and I want to add over rows and obtain a new csr_matrix with the same number of columns but reduced number of rows. Actually my problem is exactly same as this Sum over rows in scipy.sparse.csr_matrix . The only thing is I find the accepted solution to be slow for my purpose. Let me state what I have map_fn = np.random.randint(0, 10000, 1000000) map_fn here tells me how my input rows(1M) are mapped into my output rows(10K). For example ith input row gets added up into map_fn[i] output row. I tried the two approaches mentioned in the above question, namely forming

java sparse matrix problem

余生颓废 提交于 2019-12-05 15:19:17
I have two dimensional matrix. My matrix is sparse. I am facing performance problem. Can any body please answer that what api or class i can use in java to handle sparse matrix to improve my program performance. For example I want it take 100x100 matrix handle sparse stuff do the multiplication return me my matrix same as 100x100 with 0 ( mean sparse matrix ) Jama is awful for large sparse matrices. Have a look at the Colt linear algebra library. Another possibility for sparse linear algebra is the apache commons library. Might be a little lighter-weight than Colt but the difference from the

Best way of solving sparse linear systems in C++ - GPU Possible?

╄→гoц情女王★ 提交于 2019-12-05 15:06:11
I am currently working on a project where we need to solve |Ax - b|^2 . In this case, A is a very sparse matrix and A'A has at most 5 nonzero elements in each row. We are working with images and the dimension of A'A is NxN where N is the number of pixels. In this case N = 76800 . We plan to go to RGB and then the dimension will be 3Nx3N . In matlab solving (A'A)\(A'b) takes about 0.15 s, using doubles. I have now done some experimenting with Eigens sparse solvers. I have tried: SimplicialLLT SimplicialLDLT SparseQR ConjugateGradient and some different orderings. The by far best so far is

How to build pysparse on Ubuntu

偶尔善良 提交于 2019-12-05 13:34:34
When I try to install pysparse via pip install pysparse==1.3-dev , the build fails with the error: pysparse/sparse/src/spmatrixmodule.c:4:22: fatal error: spmatrix.h: No such file or directory These kinds of errors are usually the result of some missing system dev package, but googling doesn't show anything for "spmatrix". I tried installing the python-sparse package, which does provide this file, but I still get the same error. How do I fix this? In this dev-1.3 pakage there were no ".h" and ".c" files if you go through their source. Use pip install pysparse==1.2-dev213 or lower versions or

Converting python sparse matrix dict to scipy sparse matrix

为君一笑 提交于 2019-12-05 13:17:20
I am using python scikit-learn for document clustering and I have a sparse matrix stored in a dict object: For example: doc_term_dict = { ('d1','t1'): 12, \ ('d2','t3'): 10, \ ('d3','t2'): 5 \ } # from mysql data table <type 'dict'> I want to use scikit-learn to do the clustering where the input matrix type is scipy.sparse.csr.csr_matrix Example: (0, 2164) 0.245793088885 (0, 2076) 0.205702177467 (0, 2037) 0.193810934784 (0, 2005) 0.14547028437 (0, 1953) 0.153720023365 ... <class 'scipy.sparse.csr.csr_matrix'> I can't find a way to convert dict to this csr-matrix (I have never used scipy .)

Multiplying Numpy/Scipy Sparse and Dense Matrices Efficiently

核能气质少年 提交于 2019-12-05 11:25:54
问题 I'm working to implement the following equation: X =(Y.T * Y + Y.T * C * Y) ^ -1 Y is a (n x f) matrix and C is (n x n) diagonal one; n is about 300k and f will vary between 100 and 200. As part of an optimization process this equation will be used almost 100 million times so it has to be processed really fast. Y is initialized randomly and C is a very sparse matrix with only a few numbers out of the 300k on the diagonal will be different than 0.Since Numpy's diagonal functions creates dense

Convert CountVectorizer and TfidfTransformer Sparse Matrices into Separate Pandas Dataframe Rows

岁酱吖の 提交于 2019-12-05 09:33:39
Question: What is the best way to convert sparse matrices resulting from sklearn's CountVectorizer and TfidfTransformer into Pandas DataFrame columns with a separate row for each bigram and its corresponding frequency and tf-idf score? Pipeline: Bring in text data from a SQL DB, split text into bigrams and calculate the frequency per document and the tf-idf per bigram per document, load the results back into the SQL DB. Current State: Two columns of data are brought in ( number , text ). text is cleaned to produce a third column cleanText : number text cleanText 0 123 The farmer plants grain

scipy.sparse.hstack(([1], [2])) -> “ValueError: blocks must be 2-D”. Why?

僤鯓⒐⒋嵵緔 提交于 2019-12-05 09:04:17
scipy.sparse.hstack((1, [2])) and scipy.sparse.hstack((1, [2])) work well, but not scipy.sparse.hstack(([1], [2])) . Why is this the case? Here is a trace of what's happening on my system: C:\Anaconda>python Python 2.7.10 |Anaconda 2.3.0 (64-bit)| (default, May 28 2015, 16:44:52) [MSC v. 1500 64 bit (AMD64)] on win32 >>> import scipy.sparse >>> scipy.sparse.hstack((1, [2])) <1x2 sparse matrix of type '<type 'numpy.int32'>' with 2 stored elements in COOrdinate format> >>> scipy.sparse.hstack((1, 2)) <1x2 sparse matrix of type '<type 'numpy.int32'>' with 2 stored elements in COOrdinate format> >

Using scipy sparse matrices to solve system of equations

江枫思渺然 提交于 2019-12-05 08:19:54
This is a follow up to How to set up and solve simultaneous equations in python but I feel deserves its own reputation points for any answer. For a fixed integer n , I have a set of 2(n-1) simultaneous equations as follows. M(p) = 1+((n-p-1)/n)*M(n-1) + (2/n)*N(p-1) + ((p-1)/n)*M(p-1) N(p) = 1+((n-p-1)/n)*M(n-1) + (p/n)*N(p-1) M(1) = 1+((n-2)/n)*M(n-1) + (2/n)*N(0) N(0) = 1+((n-1)/n)*M(n-1) M(p) is defined for 1 <= p <= n-1 . N(p) is defined for 0 <= p <= n-2 . Notice also that p is just a constant integer in every equation so the whole system is linear. Some very nice answers were given for

How to get a big sparse matrix in R? (> 2^31-1)

依然范特西╮ 提交于 2019-12-05 08:04:40
I use some C++ code to take a text file from a database and create a dgcMatrix type sparse matrix from the Matrix package. For the first time, I'm trying to build a matrix that has more than 2^31-1 non-sparse members, which means that the index vector in the sparse matrix object must also be longer than that limit. Unfortunately, vectors seem to use 32-bit integer indices, as do NumericVectors in Rcpp. Short of writing an entire new data type from the ground up, does R provide any facility for this? I don't think I can use too exotic a solution as I need glmnet to recognize the resultant