sparse-matrix | 易学教程

Improving a badly conditioned matrix

阅读更多关于 Improving a badly conditioned matrix

I have a badly conditioned matrix, whose rcond() is close to zero, and therefore, the inverse of that matrix does not come out to be correct. I have tried using pinv() but that does not solve the problem. This is how I am taking the inverse: X = (A)\(b); I looked up for a solution to this problem and found this link (last solution) for improving the matrix. The solution there suggests to use this: A_new = A_old + c*eye(size(A_old)); Where c > 0 . So far employing this technique works in making the matrix A better conditioned and the resultant solution looks better. However, I investigated

Performing PCA on large sparse matrix by using sklearn

阅读更多关于 Performing PCA on large sparse matrix by using sklearn

I am trying to apply PCA on huge sparse matrix, in the following link it says that randomizedPCA of sklearn can handle sparse matrix of scipy sparse format. Apply PCA on very large sparse matrix However, I always get error. Can someone point out what I am doing wrong. Input matrix 'X_train' contains numbers in float64: >>>type(X_train) <class 'scipy.sparse.csr.csr_matrix'> >>>X_train.shape (2365436, 1617899) >>>X_train.ndim 2 >>>X_train[0] <1x1617899 sparse matrix of type '<type 'numpy.float64'>' with 81 stored elements in Compressed Sparse Row format> I am trying to do: >>>from sklearn

scipy.sparse default value

阅读更多关于 scipy.sparse default value

问题 The sparse matrix format (dok) assumes that values of keys not in the dictionary are equal to zero. Is there any way to make it use a default value other than zero? Also, is there a way to calculate the log of a sparse matrix (akin to np.log in regular numpy matrix) 回答1: That feature is not built-in, but if you really need this, you should be able to write your own dok_matrix class, or subclass Scipy's one. The Scipy implementation is here: https://github.com/scipy/scipy/blob/master/scipy

Mongodb unique sparse index

阅读更多关于 Mongodb unique sparse index

问题 I have created a sparse and unique index on my mongodb collection. var Account = new Schema({ email: { type: String, index: {unique: true, sparse: true} }, .... It has been created correctly: { "ns" : "MyDB.accounts", "key" : { "email" : 1 }, "name" : "email_1", "unique" : true, "sparse" : true, "background" : true, "safe" : null } But if I insert a second document with a key not set I receive this error: { [MongoError: E11000 duplicate key error index: MyDB.accounts.$email_1 dup key: { :

Computing sparse pairwise distance matrix in R

阅读更多关于 Computing sparse pairwise distance matrix in R

I have a NxM matrix and I want to compute the NxN matrix of Euclidean distances between the M points. In my problem, N is about 100,000. As I plan to use this matrix for a k-nearest neighbor algorithm, I only need to keep the k smallest distances, so the resulting NxN matrix is very sparse. This is in contrast to what comes out of dist() , for example, which would result in a dense matrix (and probably storage problems for my size N ). The packages for kNN that I've found so far ( knnflex , kknn , etc) all appear to use dense matrices. Also, the Matrix package does not offer a pairwise

When should I be using `sparse`?

阅读更多关于 When should I be using `sparse`?

问题 I've been looking through Matlab's sparse documentation trying to find whether there are any guidelines for when it makes sense to use a sparse representation rather than a full representation. For example, I have a matrix data with around 30% nonzero entries. I can check the memory used. whos data Name Size Bytes Class Attributes data 84143929x11 4394073488 double sparse data = full(data); whos data Name Size Bytes Class Attributes data 84143929x11 7404665752 double Here, I'm clearly saving

R: sparse matrix conversion

阅读更多关于 R: sparse matrix conversion

I have a matrix of factors in R and want to convert it to a matrix of dummy variables 0-1 for all possible levels of each factors. However this "dummy" matrix is very large (91690x16593) and very sparse. I need to store it in a sparse matrix, otherwise it does not fit in my 12GB of ram. Currently, I am using the following code and it works very fine and takes seconds: library(Matrix) X_factors <- data.frame(lapply(my_matrix, as.factor)) #encode factor data in a sparse matrix X <- sparse.model.matrix(~.-1, data = X_factors) However, I want to use the e1071 package in R, and eventually save this

Creating sparse matrix from a list of sparse vectors

阅读更多关于 Creating sparse matrix from a list of sparse vectors

问题 I have a list of sparse vectors (in R). I need to convert this list to a sparse matrix. Doing it via a for-loop takes a long time. sm<-spMatrix(length(tc2),n.col) for(i in 1:length(tc2)){ sm[i,]<-(tc2[i])[[1]]; } Is there a better way? 回答1: Here is a two step solution: Use lapply() and as(..., "sparseMatrix") to convert the list of sparseVectors to a list of one column sparseMatrices . Use do.call() and cBind() to combine the sparseMatrices in a single sparseMatrix . require(Matrix) # Create

Generating a dense matrix from a sparse matrix in numpy python

阅读更多关于 Generating a dense matrix from a sparse matrix in numpy python

I have a Sqlite database that contains following type of schema: termcount(doc_num, term , count) This table contains terms with their respective counts in the document. like (doc1 , term1 ,12) (doc1, term 22, 2) . . (docn,term1 , 10) This matrix can be considered as sparse matrix as each documents contains very few terms that will have a non-zero value. How would I create a dense matrix from this sparse matrix using numpy as I have to calculate the similarity among documents using cosine similarity. This dense matrix will look like a table that have docid as the first column and all the terms

Looping over the non-zero elements of a uBlas sparse matrix

阅读更多关于 Looping over the non-zero elements of a uBlas sparse matrix

问题 I have the following sparse matrix that contains O(N) elements boost::numeric::ublas::compressed_matrix<int> adjacency (N, N); I could write a brute force double loop to go over all the entries in O(N^2) time like below, but this is going to be too slow. for(int i=0; i<N; ++i) for(int j=0; j<N; ++j) std::cout << adjacency(i,j) std::endl; How can I loop over only the non-zero entries in O(N) time? For each non-zero element I would like to have access to its value, and the indexes i,j . 回答1: