sparse-matrix

Scipy sparse matrices - purpose and usage of different implementations

99封情书 提交于 2019-12-02 15:50:32
Scipy has many different types of sparse matrices available . What are the most important differences between these types, and what is the difference in their intended usage? I'm developing a code in python based on a sample code 1 in Matlab. One section of the code utilizes sparse matrices - which seem to have a single (annoying) type in Matlab, and I'm trying to figure out which type I should use 2 in python. 1: This is for a class. Most people are doing the project in Matlab, but I like to create unnecessary work and confusion --- apparently. 2: This is an academic question: I have the code

Optimising Python dictionary access code

最后都变了- 提交于 2019-12-02 14:26:51
Question: I've profiled my Python program to death, and there is one function that is slowing everything down. It uses Python dictionaries heavily, so I may not have used them in the best way. If I can't get it running faster, I will have to re-write it in C++, so is there anyone who can help me optimise it in Python? I hope I've given the right sort of explanation, and that you can make some sense of my code! Thanks in advance for any help. My code: This is the offending function, profiled using line_profiler and kernprof . I'm running Python 2.7 I'm particularly puzzled by things like lines

MappedSparseMatrix in RcppEigen

不羁的心 提交于 2019-12-02 14:19:49
问题 I want to use conjugate gradient algorithm implemented in RcppEigen package for solving large sparse matrices. Since I am new to Rcpp and C++, I just started with the dense matrices. // [[Rcpp::depends(RcppEigen)]] #include <Rcpp.h> #include <RcppEigen.h> #include <Eigen/IterativeLinearSolvers> using Eigen::SparseMatrix; using Eigen::MappedSparseMatrix; using Eigen::Map; using Eigen::MatrixXd; using Eigen::VectorXd; using Rcpp::as; using Eigen::ConjugateGradient; typedef Eigen:

Implications of manually setting scipy sparse matrix shape

让人想犯罪 __ 提交于 2019-12-02 13:56:40
问题 I need to perform online training on a TF-IDF model. I found that scipy's TfidfVectorizer does not support training on online fashion, so I'm implementing my own CountVectorizer to support online training and then use the scipy's TfidfTransformer to update tf-idf values after a pre-defined number of documents have entered in the corpus. I found here that you shouldn't be adding rows or columns to numpy arrays since all data would need to be copied so it is stored in contiguous blocks of

create a sparse matrix; given the indices of non-zero elements for creation of dummy variables of a categorical column of a large dataset

懵懂的女人 提交于 2019-12-02 13:41:51
问题 I'm trying to use a sparse matrix to generate dummy variables for a set of data with 5.8 million rows and two categorical columns. The structure of the data is: mydata: data.table of 5,800,000 rows and two categorical (in integer format) variables Var1 and Var2 nlevel(Var1) : 210,000 (levels include all numbers between 1 and 210,000) nlevel(Var2) : 500 (levels include all numbers between 1 and 500) here's an example of mydata: Var_1 Var_2 1 4 1 2 2 7 5 9 5 500 . . . 200 6 200 2 200 80 . . . I

MappedSparseMatrix in RcppEigen

て烟熏妆下的殇ゞ 提交于 2019-12-02 11:57:56
I want to use conjugate gradient algorithm implemented in RcppEigen package for solving large sparse matrices. Since I am new to Rcpp and C++, I just started with the dense matrices. // [[Rcpp::depends(RcppEigen)]] #include <Rcpp.h> #include <RcppEigen.h> #include <Eigen/IterativeLinearSolvers> using Eigen::SparseMatrix; using Eigen::MappedSparseMatrix; using Eigen::Map; using Eigen::MatrixXd; using Eigen::VectorXd; using Rcpp::as; using Eigen::ConjugateGradient; typedef Eigen::MappedSparseMatrix<double> MSpMat; // [[Rcpp::export]] VectorXd getEigenValues(SEXP As, SEXP bs) { const Map<MatrixXd

Efficiently test matrix rows and columns with numpy

℡╲_俬逩灬. 提交于 2019-12-02 10:57:51
问题 I am trying to remove both the row i and column i when both the row i and column i contains all 0s. For example in this case we can see that row 0 is all zeros and column 0 is all zeros and thus row and column 0 is removed. Same with row column pair 2 and 4. Row 1 is all zeros but column 1 is not so neither are removed. [0,0,0,0,0] [0,1,0,1,0] [0,0,0,0,0] [0,0,0,0,0] [0,0,0,0,0] would become [1,1] [0,0] Another example: [0,0,1,0,0,1] [0,0,0,0,0,0] [0,0,0,0,0,0] [0,0,0,0,0,0] [0,0,0,0,0,0] [0

reshape scipy csr matrix

百般思念 提交于 2019-12-02 09:30:25
How can I reshape efficiently and scipy.sparse csr_matrix? I need to add zero rows at the end. Using: from scipy.sparse import csr_matrix data = [1,2,3,4,5,6] col = [0,0,0,1,1,1] row = [0,1,2,0,1,2] a = csr_matrix((data, (row, col))) a.reshape(3,5) I get this error: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.5/dist-packages/scipy/sparse/base.py", line 129, in reshape self.__class__.__name__) NotImplementedError: Reshaping not implemented for csr_matrix. If you can catch the problem early enough, just include a shape parameter: In [48]: a

scipy.sparse.coo_matrix how to fast find all zeros column, fill with 1 and normalize

六月ゝ 毕业季﹏ 提交于 2019-12-02 08:57:19
问题 For a matrix, i want to find columns with all zeros and fill with 1s, and then normalize the matrix by column. I know how to do that with np.arrays [[0 0 0 0 0] [0 0 1 0 0] [1 0 0 1 0] [0 0 0 0 1] [1 0 0 0 0]] | V [[0 1 0 0 0] [0 1 1 0 0] [1 1 0 1 0] [0 1 0 0 1] [1 1 0 0 0]] | V [[0 0.2 0 0 0] [0 0.2 1 0 0] [0.5 0.2 0 1 0] [0 0.2 0 0 1] [0.5 0.2 0 0 0]] But how can I do the same thing when the matrix is in scipy.sparse.coo.coo_matrix form, without converting it back to np.arrays. how can I

Handling a very big and sparse matrix in Matlab

半世苍凉 提交于 2019-12-02 07:40:45
问题 I have a very big and sparse matrix, represented as a CSV file (67 GB). Is it possible to load and work with this matrix in Matlab? I can use a 64bit version on a MAC OS computer, 8GB RAM. I have read a few posts about this topics but still I am not sure if Matlab 64bit on Mac OS can use the disk space for allocating the matrix or need everything in RAM and, anyway, if the use of such a big portion of disk space can make things almost unusable. 回答1: It sounds like memory mapping is the