sparse-matrix | 易学教程

Mongodb unique sparse index

阅读更多关于 Mongodb unique sparse index

I have created a sparse and unique index on my mongodb collection. var Account = new Schema({ email: { type: String, index: {unique: true, sparse: true} }, .... It has been created correctly: { "ns" : "MyDB.accounts", "key" : { "email" : 1 }, "name" : "email_1", "unique" : true, "sparse" : true, "background" : true, "safe" : null } But if I insert a second document with a key not set I receive this error: { [MongoError: E11000 duplicate key error index: MyDB.accounts.$email_1 dup key: { : null }] name: 'MongoError', err: 'E11000 duplicate key error index: MyDB.accounts.$email_1 dup key: { :

clustering on very large sparse matrix?

阅读更多关于 clustering on very large sparse matrix?

问题 I am trying to do some (k-means) clustering on a very large matrix. The matrix is approximately 500000 rows x 4000 cols yet very sparse (only a couple of "1" values per row). I want to get around 2000 clusters. I got two questions: - Can someone recommend an open source platform or tool for doing that (maybe using k-means, maybe with something better)? - How can I best estimate the time the algorithm will need to finish? I tried weka once, but aborted the job after a couple of days because I

When should I be using `sparse`?

阅读更多关于 When should I be using `sparse`?

I've been looking through Matlab's sparse documentation trying to find whether there are any guidelines for when it makes sense to use a sparse representation rather than a full representation. For example, I have a matrix data with around 30% nonzero entries. I can check the memory used. whos data Name Size Bytes Class Attributes data 84143929x11 4394073488 double sparse data = full(data); whos data Name Size Bytes Class Attributes data 84143929x11 7404665752 double Here, I'm clearly saving memory, but would this be true of any matrix with 30% nonzero entries? What about 50% nonzero entries?

Weka printing sparse arff file

阅读更多关于 Weka printing sparse arff file

问题 I was trying out the sparse representation of the arff file as shown here. In my program I am able to print the the class label "B" but for some reason it is not printing "A". attVals = new FastVector(); attVals.addElement("A"); attVals.addElement("B"); atts.addElement(new Attribute("class", attVals)); vals[index] = attVals.indexOf("A"); The output for the program is like - {0 6,2 8} --- I should get {0 6,2 8,3 A} But when I do vals[index] = attVals.indexOf("B"); I get proper output - {0 6,2

Reshape sparse matrix efficiently, Python, SciPy 0.12

阅读更多关于 Reshape sparse matrix efficiently, Python, SciPy 0.12

问题 In another post regarding resizing of a sparse matrix in SciPy the accepted answer works when more rows or columns are to be added, using scipy.sparse.vstack or hstack , respectively. In SciPy 0.12 the reshape or set_shape methods are still not implemented. Are there some stabilished good practices to reshape a sparse matrix in SciPy 0.12? It would be nice to have some timing comparisons. 回答1: As of SciPy 1.1.0, the reshape and set_shape methods have been implemented for all sparse matrix

Efficient slicing of matrices using matrix multiplication, with Python, NumPy, SciPy

阅读更多关于 Efficient slicing of matrices using matrix multiplication, with Python, NumPy, SciPy

问题 I want to reshape a 2d scipy.sparse.csr.csr_matrix (let us call it A ) to a 2d numpy.ndarray (let us call this B ). A could be >shape(A) (90, 10) then B should be >shape(B) (9,10) where each 10 rows of A would be reshaped in a new new value, namely the maximum of this window and column. The column operator is not working on this unhashable type of a sparse matrix. How can I get this B by using matrix multiplications? 回答1: Using matrix multiplication you can do en efficient slicing creating a

Efficient way to set elements to zero where mask is True on scipy sparse matrix

阅读更多关于 Efficient way to set elements to zero where mask is True on scipy sparse matrix

I have two scipy_sparse_csr_matrix 'a' and scipy_sparse_csr_matrix(boolean) 'mask', and I want to set elements of 'a' to zero where element of mask is True. for example >>>a <3x3 sparse matrix of type '<type 'numpy.int32'>' with 4 stored elements in Compressed Sparse Row format> >>>a.todense() matrix([[0, 0, 3], [0, 1, 5], [7, 0, 0]]) >>>mask <3x3 sparse matrix of type '<type 'numpy.bool_'>' with 4 stored elements in Compressed Sparse Row format> >>>mask.todense() matrix([[ True, False, True], [False, False, True], [False, True, False]], dtype=bool) Then I want to obtain the following result.

scipy.sparse : Set row to zeros

阅读更多关于 scipy.sparse : Set row to zeros

Suppose I have a matrix in the CSR format, what is the most efficient way to set a row (or rows) to zeros? The following code runs quite slowly: A = A.tolil() A[indices, :] = 0 A = A.tocsr() I had to convert to scipy.sparse.lil_matrix because the CSR format seems to support neither fancy indexing nor setting values to slices. I guess scipy just does not implement it, but the CSR format would support this quite well, please read the wikipedia article on "Sparse matrix" about what indptr , etc. are: # A.indptr is an array, one for each row (+1 for the nnz): def csr_row_set_nz_to_val(csr, row,

What is the fastest way to slice a scipy.sparse matrix?

阅读更多关于 What is the fastest way to slice a scipy.sparse matrix?

问题 I normally use matrix[:, i:] It seems not work as fast as I expected. 回答1: If you want to obtain a sparse matrix as output the fastest way to do row slicing is to have a csr type, and for columns slicing csc , as detailed here. In both cases you just have to do what you are currently doing: matrix[l1:l2,c1:c2] If you want another type as output there maybe faster ways. In this other answer it is explained many methods for slicing a matrix and their different timings compared. For example, if

Argmax of each row or column in scipy sparse matrix

阅读更多关于 Argmax of each row or column in scipy sparse matrix

scipy.sparse.coo_matrix.max returns the maximum value of each row or column, given an axis. I would like to know not the value, but the index of the maximum value of each row or column. I haven't found a way to make this in an efficient manner yet, so I'll gladly accept any help. From scipy version 0.19, both csr_matrix and csc_matrix support argmax() and argmin() methods. hpaulj I would suggest studying the code for moo._min_or_max_axis where moo is a coo_matrix . mat = mat.tocsc() # for axis=0 mat.sum_duplicates() major_index, value = mat._minor_reduce(min_or_max) not_full = np.diff(mat