sparse-matrix | 易学教程

RcppEigen sparse matrix insert operation gives invalid class “dgCMatrix” error

阅读更多关于 RcppEigen sparse matrix insert operation gives invalid class “dgCMatrix” error

问题 I'm trying to get up to speed on using C++ to quickly build some sparse matrices for use in R. However, I cannot seem to use the insert method to change single elements of a sparse matrix in Eigen and get a correct R object of class dgCMatrix. A simple example is below. The C++ code is: #include <RcppEigen.h> // [[Rcpp::depends(RcppEigen)]] using Eigen::SparseMatrix; // sparse matrix // [[Rcpp::export]] SparseMatrix<double> SimpleSparseMatrix(int n) { SparseMatrix<double> new_mat(n, n); new

Create Sparse Matrix from a data frame

阅读更多关于 Create Sparse Matrix from a data frame

问题 I m doing an assignment where I am trying to build a collaborative filtering model for the Netflix prize data. The data that I am using is in a CSV file which I easily imported into a data frame. Now what I need to do is create a sparse matrix consisting of the Users as the rows and Movies as the columns and each cell is filled up by the corresponding rating value. When I try to map out the values in the data frame I need to run a loop for each row in the data frame, which is taking a lot of

Efficient way to do one-hot encoding in R on large data

阅读更多关于 Efficient way to do one-hot encoding in R on large data

问题 I'm trying to create a one-hot representation of my data. This is my approach: data(iris) iris = as.data.frame(apply(iris, 2, function(x) as.factor(x))) head(iris) iris_ohe <- data.frame(model.matrix(~.-1, iris)) head(iris_ohe) dim(iris_ohe) The thing is, the data I'm working on has over 1 million rows, and doing the encoding, I get a matrix with over 100 columns. This is too much for R and I run out of memory: Error: cannot allocate vector of size 10204.5 Gb Is there a better approach I

What is csr_matrix.A? [duplicate]

阅读更多关于 What is csr_matrix.A? [duplicate]

问题 This question already has answers here : What is the built-in function A of numpy array.A? (4 answers) Closed 3 months ago . I've recently seen something like this: import numpy as np row = np.array([0, 0, 1, 2, 2, 2]) col = np.array([0, 2, 2, 0, 1, 2]) data = np.array([1, 2, 3, 4, 5, 6]) from scipy.sparse import csr_matrix csr_matrix((data, (row, col)), shape=(3, 3)).A In this case, it returns an numpy array: array([[1, 0, 2], [0, 0, 3], [4, 5, 6]], dtype=int64) This seems to be simply a non

Is sparse matrix-vector multiplication faster in Matlab than in Python?

阅读更多关于 Is sparse matrix-vector multiplication faster in Matlab than in Python?

问题 Edit: See this question where I learned how to parallelize sparse matrix-vector multiplication in Python using Numba, and was able to tie with Matlab. Original question: I'm observing that sparse matrix-vector multiplication is about 4 or 5 times faster in Matlab than in Python (using scipy sparse matrices). Here are some details from the Matlab command line: >> whos A Name Size Bytes Class Attributes A 47166x113954 610732376 double sparse >> whos ATrans Name Size Bytes Class Attributes

'matrix too large' exception using colt java lib

阅读更多关于 'matrix too large' exception using colt java lib

问题 I was using cern.colt.matrix.* lib for sparse matrix calculations ..but it seems that I keep running into this error: Exception in thread "main" java.lang.IllegalArgumentException: matrix too large I think this is because the constructor throws exception when nrows*ncols > INTEGER.max api: http://acs.lbl.gov/software/colt/api/cern/colt/matrix/impl/SparseDoubleMatrix2D.html exception: IllegalArgumentException - if rows<0 || columns<0 || (double)columns*rows > Integer.MAX_VALUE. My rows are:

Setting elements in .data attribute to zero unpleasant behaivor in scipy.sparse

阅读更多关于 Setting elements in .data attribute to zero unpleasant behaivor in scipy.sparse

问题 I getting unpleasant behavior when I set values in .data of csr_matrix to zero. Here is an example: from scipy import sparse a = sparse.csr_matrix([[0,0,2,0], [1,1,0,0],[0,3,0,0]]) Output: >>> a.A array([[0, 0, 2, 0], [1, 1, 0, 0], [0, 3, 0, 0]]) >>> a.data array([2, 1, 1, 3]) >>> a.data[3] = 0 # setting one element to zero >>> a.A array([[0, 0, 2, 0], [1, 1, 0, 0], [0, 0, 0, 0]]) >>> a.data array([2, 1, 1, 0]) # however, this zero is still considered part of data # what I would like to see

ValueError taking dot product of two sparse matrices in SciPy

阅读更多关于 ValueError taking dot product of two sparse matrices in SciPy

问题 I'm trying to take the dot product of two lil_matrix sparse matrices that are approx. 100,000 x 50,000 and 50,000 x 100,000 respectively. from scipy import sparse a = sparse.lil_matrix((100000, 50000)) b = sparse.lil_matrix((50000, 100000)) c = a.dot(b) and getting this error: File "/usr/lib64/python2.6/site-packages/scipy/sparse/base.py", line 211, in dot return self * other File "/usr/lib64/python2.6/site-packages/scipy/sparse/base.py", line 247, in __mul__ return self._mul_sparse_matrix

Cluster Analysis in R on large sparse matrix

阅读更多关于 Cluster Analysis in R on large sparse matrix

问题 I have a transaction dataset with 250000 transactions (rows) and 2183 items (columns). I wanna transform it to a sparse matrix and then do hierarchical cluster on it. I tried package 'sparcl', but it seems it doesn't work on sparse matrix. Any suggestion about how to solve this problem? Or any other package I can use to do cluster analysis on sparse matrix? Thanks! 回答1: Affinity propagation, as implemented in the apcluster package, supports sparse matrices since version 1.4.0. So please give

Why can't I assign data to part of sparse matrix in the first “try:”?

阅读更多关于 Why can't I assign data to part of sparse matrix in the first “try:”?

问题 I want to assign a value to part of a crs sparse matrix (I know it's expensive but it doesn't matter in my project). I tried to assign a float variable to part of the sparse matrix but it doesn't work the first time. However, if I do the exact same thing in the "except" it will work flawlessly. I then tried to check the dtype of the sparse matrix and part of it and they are different for some reason. The datatype of the whole matrix is float16 as I assigned, but part of the matrix has a