sparse-matrix

Mongodb unique sparse index

孤街醉人 提交于 2019-11-30 19:09:05
I have created a sparse and unique index on my mongodb collection. var Account = new Schema({ email: { type: String, index: {unique: true, sparse: true} }, .... It has been created correctly: { "ns" : "MyDB.accounts", "key" : { "email" : 1 }, "name" : "email_1", "unique" : true, "sparse" : true, "background" : true, "safe" : null } But if I insert a second document with a key not set I receive this error: { [MongoError: E11000 duplicate key error index: MyDB.accounts.$email_1 dup key: { : null }] name: 'MongoError', err: 'E11000 duplicate key error index: MyDB.accounts.$email_1 dup key: { :

clustering on very large sparse matrix?

社会主义新天地 提交于 2019-11-30 18:49:58
问题 I am trying to do some (k-means) clustering on a very large matrix. The matrix is approximately 500000 rows x 4000 cols yet very sparse (only a couple of "1" values per row). I want to get around 2000 clusters. I got two questions: - Can someone recommend an open source platform or tool for doing that (maybe using k-means, maybe with something better)? - How can I best estimate the time the algorithm will need to finish? I tried weka once, but aborted the job after a couple of days because I

When should I be using `sparse`?

独自空忆成欢 提交于 2019-11-30 18:31:40
I've been looking through Matlab's sparse documentation trying to find whether there are any guidelines for when it makes sense to use a sparse representation rather than a full representation. For example, I have a matrix data with around 30% nonzero entries. I can check the memory used. whos data Name Size Bytes Class Attributes data 84143929x11 4394073488 double sparse data = full(data); whos data Name Size Bytes Class Attributes data 84143929x11 7404665752 double Here, I'm clearly saving memory, but would this be true of any matrix with 30% nonzero entries? What about 50% nonzero entries?

Weka printing sparse arff file

霸气de小男生 提交于 2019-11-30 18:17:25
问题 I was trying out the sparse representation of the arff file as shown here. In my program I am able to print the the class label "B" but for some reason it is not printing "A". attVals = new FastVector(); attVals.addElement("A"); attVals.addElement("B"); atts.addElement(new Attribute("class", attVals)); vals[index] = attVals.indexOf("A"); The output for the program is like - {0 6,2 8} --- I should get {0 6,2 8,3 A} But when I do vals[index] = attVals.indexOf("B"); I get proper output - {0 6,2

Reshape sparse matrix efficiently, Python, SciPy 0.12

六月ゝ 毕业季﹏ 提交于 2019-11-30 17:37:42
问题 In another post regarding resizing of a sparse matrix in SciPy the accepted answer works when more rows or columns are to be added, using scipy.sparse.vstack or hstack , respectively. In SciPy 0.12 the reshape or set_shape methods are still not implemented. Are there some stabilished good practices to reshape a sparse matrix in SciPy 0.12? It would be nice to have some timing comparisons. 回答1: As of SciPy 1.1.0, the reshape and set_shape methods have been implemented for all sparse matrix

Efficient slicing of matrices using matrix multiplication, with Python, NumPy, SciPy

家住魔仙堡 提交于 2019-11-30 17:23:42
问题 I want to reshape a 2d scipy.sparse.csr.csr_matrix (let us call it A ) to a 2d numpy.ndarray (let us call this B ). A could be >shape(A) (90, 10) then B should be >shape(B) (9,10) where each 10 rows of A would be reshaped in a new new value, namely the maximum of this window and column. The column operator is not working on this unhashable type of a sparse matrix. How can I get this B by using matrix multiplications? 回答1: Using matrix multiplication you can do en efficient slicing creating a

Efficient way to set elements to zero where mask is True on scipy sparse matrix

南笙酒味 提交于 2019-11-30 13:44:01
I have two scipy_sparse_csr_matrix 'a' and scipy_sparse_csr_matrix(boolean) 'mask', and I want to set elements of 'a' to zero where element of mask is True. for example >>>a <3x3 sparse matrix of type '<type 'numpy.int32'>' with 4 stored elements in Compressed Sparse Row format> >>>a.todense() matrix([[0, 0, 3], [0, 1, 5], [7, 0, 0]]) >>>mask <3x3 sparse matrix of type '<type 'numpy.bool_'>' with 4 stored elements in Compressed Sparse Row format> >>>mask.todense() matrix([[ True, False, True], [False, False, True], [False, True, False]], dtype=bool) Then I want to obtain the following result.

scipy.sparse : Set row to zeros

可紊 提交于 2019-11-30 13:42:23
Suppose I have a matrix in the CSR format, what is the most efficient way to set a row (or rows) to zeros? The following code runs quite slowly: A = A.tolil() A[indices, :] = 0 A = A.tocsr() I had to convert to scipy.sparse.lil_matrix because the CSR format seems to support neither fancy indexing nor setting values to slices. I guess scipy just does not implement it, but the CSR format would support this quite well, please read the wikipedia article on "Sparse matrix" about what indptr , etc. are: # A.indptr is an array, one for each row (+1 for the nnz): def csr_row_set_nz_to_val(csr, row,

What is the fastest way to slice a scipy.sparse matrix?

别说谁变了你拦得住时间么 提交于 2019-11-30 12:39:53
问题 I normally use matrix[:, i:] It seems not work as fast as I expected. 回答1: If you want to obtain a sparse matrix as output the fastest way to do row slicing is to have a csr type, and for columns slicing csc , as detailed here. In both cases you just have to do what you are currently doing: matrix[l1:l2,c1:c2] If you want another type as output there maybe faster ways. In this other answer it is explained many methods for slicing a matrix and their different timings compared. For example, if

Argmax of each row or column in scipy sparse matrix

生来就可爱ヽ(ⅴ<●) 提交于 2019-11-30 11:46:42
scipy.sparse.coo_matrix.max returns the maximum value of each row or column, given an axis. I would like to know not the value, but the index of the maximum value of each row or column. I haven't found a way to make this in an efficient manner yet, so I'll gladly accept any help. From scipy version 0.19, both csr_matrix and csc_matrix support argmax() and argmin() methods. hpaulj I would suggest studying the code for moo._min_or_max_axis where moo is a coo_matrix . mat = mat.tocsc() # for axis=0 mat.sum_duplicates() major_index, value = mat._minor_reduce(min_or_max) not_full = np.diff(mat