sparse-matrix

RcppEigen sparse matrix insert operation gives invalid class “dgCMatrix” error

只谈情不闲聊 提交于 2019-12-11 01:18:40
问题 I'm trying to get up to speed on using C++ to quickly build some sparse matrices for use in R. However, I cannot seem to use the insert method to change single elements of a sparse matrix in Eigen and get a correct R object of class dgCMatrix. A simple example is below. The C++ code is: #include <RcppEigen.h> // [[Rcpp::depends(RcppEigen)]] using Eigen::SparseMatrix; // sparse matrix // [[Rcpp::export]] SparseMatrix<double> SimpleSparseMatrix(int n) { SparseMatrix<double> new_mat(n, n); new

Create Sparse Matrix from a data frame

元气小坏坏 提交于 2019-12-10 23:19:09
问题 I m doing an assignment where I am trying to build a collaborative filtering model for the Netflix prize data. The data that I am using is in a CSV file which I easily imported into a data frame. Now what I need to do is create a sparse matrix consisting of the Users as the rows and Movies as the columns and each cell is filled up by the corresponding rating value. When I try to map out the values in the data frame I need to run a loop for each row in the data frame, which is taking a lot of

Efficient way to do one-hot encoding in R on large data

前提是你 提交于 2019-12-10 22:28:12
问题 I'm trying to create a one-hot representation of my data. This is my approach: data(iris) iris = as.data.frame(apply(iris, 2, function(x) as.factor(x))) head(iris) iris_ohe <- data.frame(model.matrix(~.-1, iris)) head(iris_ohe) dim(iris_ohe) The thing is, the data I'm working on has over 1 million rows, and doing the encoding, I get a matrix with over 100 columns. This is too much for R and I run out of memory: Error: cannot allocate vector of size 10204.5 Gb Is there a better approach I

What is csr_matrix.A? [duplicate]

蓝咒 提交于 2019-12-10 22:25:01
问题 This question already has answers here : What is the built-in function A of numpy array.A? (4 answers) Closed 3 months ago . I've recently seen something like this: import numpy as np row = np.array([0, 0, 1, 2, 2, 2]) col = np.array([0, 2, 2, 0, 1, 2]) data = np.array([1, 2, 3, 4, 5, 6]) from scipy.sparse import csr_matrix csr_matrix((data, (row, col)), shape=(3, 3)).A In this case, it returns an numpy array: array([[1, 0, 2], [0, 0, 3], [4, 5, 6]], dtype=int64) This seems to be simply a non

Is sparse matrix-vector multiplication faster in Matlab than in Python?

时光毁灭记忆、已成空白 提交于 2019-12-10 22:07:52
问题 Edit: See this question where I learned how to parallelize sparse matrix-vector multiplication in Python using Numba, and was able to tie with Matlab. Original question: I'm observing that sparse matrix-vector multiplication is about 4 or 5 times faster in Matlab than in Python (using scipy sparse matrices). Here are some details from the Matlab command line: >> whos A Name Size Bytes Class Attributes A 47166x113954 610732376 double sparse >> whos ATrans Name Size Bytes Class Attributes

'matrix too large' exception using colt java lib

。_饼干妹妹 提交于 2019-12-10 21:27:27
问题 I was using cern.colt.matrix.* lib for sparse matrix calculations ..but it seems that I keep running into this error: Exception in thread "main" java.lang.IllegalArgumentException: matrix too large I think this is because the constructor throws exception when nrows*ncols > INTEGER.max api: http://acs.lbl.gov/software/colt/api/cern/colt/matrix/impl/SparseDoubleMatrix2D.html exception: IllegalArgumentException - if rows<0 || columns<0 || (double)columns*rows > Integer.MAX_VALUE. My rows are:

Setting elements in .data attribute to zero unpleasant behaivor in scipy.sparse

杀马特。学长 韩版系。学妹 提交于 2019-12-10 18:34:52
问题 I getting unpleasant behavior when I set values in .data of csr_matrix to zero. Here is an example: from scipy import sparse a = sparse.csr_matrix([[0,0,2,0], [1,1,0,0],[0,3,0,0]]) Output: >>> a.A array([[0, 0, 2, 0], [1, 1, 0, 0], [0, 3, 0, 0]]) >>> a.data array([2, 1, 1, 3]) >>> a.data[3] = 0 # setting one element to zero >>> a.A array([[0, 0, 2, 0], [1, 1, 0, 0], [0, 0, 0, 0]]) >>> a.data array([2, 1, 1, 0]) # however, this zero is still considered part of data # what I would like to see

ValueError taking dot product of two sparse matrices in SciPy

不羁岁月 提交于 2019-12-10 18:09:17
问题 I'm trying to take the dot product of two lil_matrix sparse matrices that are approx. 100,000 x 50,000 and 50,000 x 100,000 respectively. from scipy import sparse a = sparse.lil_matrix((100000, 50000)) b = sparse.lil_matrix((50000, 100000)) c = a.dot(b) and getting this error: File "/usr/lib64/python2.6/site-packages/scipy/sparse/base.py", line 211, in dot return self * other File "/usr/lib64/python2.6/site-packages/scipy/sparse/base.py", line 247, in __mul__ return self._mul_sparse_matrix

Cluster Analysis in R on large sparse matrix

和自甴很熟 提交于 2019-12-10 17:56:03
问题 I have a transaction dataset with 250000 transactions (rows) and 2183 items (columns). I wanna transform it to a sparse matrix and then do hierarchical cluster on it. I tried package 'sparcl', but it seems it doesn't work on sparse matrix. Any suggestion about how to solve this problem? Or any other package I can use to do cluster analysis on sparse matrix? Thanks! 回答1: Affinity propagation, as implemented in the apcluster package, supports sparse matrices since version 1.4.0. So please give

Why can't I assign data to part of sparse matrix in the first “try:”?

江枫思渺然 提交于 2019-12-10 17:44:01
问题 I want to assign a value to part of a crs sparse matrix (I know it's expensive but it doesn't matter in my project). I tried to assign a float variable to part of the sparse matrix but it doesn't work the first time. However, if I do the exact same thing in the "except" it will work flawlessly. I then tried to check the dtype of the sparse matrix and part of it and they are different for some reason. The datatype of the whole matrix is float16 as I assigned, but part of the matrix has a