sparse-matrix

Substitute for numpy broadcasting using scipy.sparse.csc_matrix

别等时光非礼了梦想. 提交于 2019-12-05 23:21:40
问题 I have in my code the following expression: a = (b / x[:, np.newaxis]).sum(axis=1) where b is an ndarray of shape (M, N) , and x is an ndarray of shape (M,) . Now, b is actually sparse, so for memory efficiency I would like to substitute in a scipy.sparse.csc_matrix or csr_matrix . However, broadcasting in this way is not implemented (even though division or multiplication is guaranteed to maintain sparsity) (the entries of x are non-zero), and raises a NotImplementedError . Is there a sparse

How can I find indices of each row of a matrix which has a duplicate in matlab?

我的未来我决定 提交于 2019-12-05 23:18:49
问题 I want to find the indices all the rows of a matrix which have duplicates. For example A = [1 2 3 4 1 2 3 4 2 3 4 5 1 2 3 4 6 5 4 3] The vector to be returned would be [1,2,4] A lot of similar questions suggest using the unique function, which I've tried but the closest I can get to what I want is: [C, ia, ic] = unique(A, 'rows') ia = [1 3 5] m = 5; setdiff(1:m,ia) = [2,4] But using unique I can only extract the 2nd,3rd,4th...etc instance of a row, and I need to also obtain the first. Is

Remote linux server to remote linux server large sparse files copy - How To?

我是研究僧i 提交于 2019-12-05 22:53:50
I have two twins CentOS 5.4 servers with VMware Server installed on each. What is the most reliable and fast method for copying virtual machines files from one server to the other, assuming that I always use sparse file for my vmware virtual machines? The vm's files are a pain to copy since they are very large (50 GB) but since they are sparse files I think something can be done to improve the speed of the copy. If you want to copy large data quickly, rsync over SSH is not for you. As running an rsync daemon for quick one-shot copying is also overkill, yer olde tar and nc do the trick as

How to convert co-occurrence matrix to sparse matrix

爷,独闯天下 提交于 2019-12-05 22:18:21
I am starting dealing with sparse matrices so I'm not really proficient on this topic. My problem is, I have a simple coo-occurrences matrix from a word list, just a 2-dimensional co-occurrence matrix word by word counting how many times a word occurs in same context. The matrix is quite sparse since the corpus is not that big. I want to convert it to a sparse matrix to be able to deal better with it, eventually do some matrix multiplication afterwards. Here what I have done until now (only the first part, the rest is just output format and cleaning data): def matrix(from_corpus): d =

how to implement a sparse_vector class

╄→尐↘猪︶ㄣ 提交于 2019-12-05 22:18:08
I am implementing a templated sparse_vector class. It's like a vector, but it only stores elements that are different from their default constructed value. So, sparse_vector would store the lazily-sorted index-value pairs for all indices whose value is not T(). I am basing my implementation on existing sparse vectors in numeric libraries-- though mine will handle non-numeric types T as well. I looked at boost::numeric::ublas::coordinate_vector and eigen::SparseVector . Both store: size_t* indices_; // a dynamic array T* values_; // a dynamic array int size_; int capacity_; Why don't they

Subset of a matrix multiplication, fast, and sparse

不想你离开。 提交于 2019-12-05 19:29:44
Converting a collaborative filtering code to use sparse matrices I'm puzzling on the following problem: given two full matrices X (m by l) and Theta (n by l), and a sparse matrix R (m by n), is there a fast way to calculate the sparse inner product . Large dimensions are m and n (order 100000), while l is small (order 10). This is probably a fairly common operation for big data since it shows up in the cost function of most linear regression problems, so I'd expect a solution built into scipy.sparse, but I haven't found anything obvious yet. The naive way to do this in python is R.multiply(X

How to handle huge sparse matrices construction using Scipy?

老子叫甜甜 提交于 2019-12-05 18:31:29
So, I am working on a Wikipedia dump to compute the pageranks of around 5,700,000 pages give or take. The files are preprocessed and hence are not in XML. They are taken from http://haselgrove.id.au/wikipedia.htm and the format is: from_page(1): to(12) to(13) to(14).. from_page(2): to(21) to(22).. . . . from_page(5,700,000): to(xy) to(xz) so on. So. basically it's a construction of a [5,700,000*5,700,000] matrix, which would just break my 4 gigs of RAM. Since, it is very-very Sparse, that makes it easier to store using scipy.lil.sparse or scipy.dok.sparse , now my issue is: How on earth do I

Scipy sparse matrices element wise multiplication

梦想的初衷 提交于 2019-12-05 18:15:10
I am trying to do an element-wise multiplication for two large sparse matrices. Both are of size around (400K X 500K), with around 100M elements. However, they might not have non-zero elements in the same positions, and they might not have the same number of non-zero elements. In either situation, Im okay with multiplying the non-zero value of one matrix and the zero value in the other matrix to zero. I keep running out of memory (8GB) in every approach, which doesnt make much sense. I shouldnt be. These are what I've tried. A and B are sparse matrices (Ive tried with COO and CSC formats). # I

Are there any storage optimized Sparse Matrix implementations in C#?

◇◆丶佛笑我妖孽 提交于 2019-12-05 17:22:23
问题 Are there any storage optimized Sparse Matrix implementations in C#? 回答1: There is Math.NET. It has some Spare Matrix implementations. (link is to the old Math.NET site. There is no longer an online version of the documentation). 回答2: If you are looking for high performance sparse matrix implementation check out NMath from CenterSpace software. Here's a partial list of functionality cut from here on CenterSpace's website. Full-featured structured sparse matrix classes, including triangular,

Given a matrix of type `scipy.sparse.coo_matrix` how to determine index and value of maximum of each row?

随声附和 提交于 2019-12-05 17:18:57
Given a sparse matrix R of type scipy.sparse.coo_matrix of shape 1.000.000 x 70.000 I figured out that row_maximum = max(R.getrow(i).data) will give me the maximum value of the i-th row. What I need now is the index corresponding to the value row_maximum . Any ideas how to achieve that? Thanks for any advice in advance! getrow(i) returns a 1 x n CSR matrix, which has an indices attribute that gives the row indices of the corresponding values in the data attribute. (We know the shape is 1 x n, so we don't have to deal with the indptr attribute.) So this will work: row = R.getrow(i) max_index =