sparse-matrix

sklearn GMM raises “ValueError: setting an array element with a sequence.” on sparse matrix

一世执手 提交于 2019-12-12 00:45:12
问题 I am attempting to cluster a set of data points that are represented as a sparse scipy matrix, X. That is, >>> print type(X) <class 'scipy.sparse.csr.csr_matrix'> >>> print X.shape (57, 1038) >>> print X[0] (0, 223) 0.471313296962 (0, 420) 0.621222153695 (0, 1030) 0.442688836467 (0, 124) 0.442688836467 When I feed this matrix into an sklearn.mixture.GMM model, however, it raises the following ValueError: File "/Library/Python/2.7/site-packages/sklearn/mixture/gmm.py", line 423, in fit X = np

Not able to understand output of CSR Representation in CUSP

社会主义新天地 提交于 2019-12-12 00:08:56
问题 I am trying to use the CUSP library. I am reading .txt files which are basically sparse COO representation. I am using CUSP to convert into CSR format. When I print the matrix with cusp::print() it prints the correct outcome for COO representation. However when I convert the matrix into CSR, I have written my own function for printing but the outcome is not what I want. Here is the snippet main() { //. //bla bla //.. //create a 2d coo matrix cusp::coo_matrix<int, int, cusp::host_memory> D

coefficients of logistic regression have no attribute indices in pyspark

╄→尐↘猪︶ㄣ 提交于 2019-12-11 18:31:12
问题 I wrote this code and the coefficients are not available as a sparse vector so I can't extract the indices in order to identify the active entries of the model. lr = LogisticRegression(elasticNetParam = 1.0, featuresCol = "features", labelCol = target_var) lasso_model = lr.fit(training_full) ## Extract variables with coefficients !=0 (sparse vector) + sorting coeff = lasso_model.coefficients coeff.indices 来源: https://stackoverflow.com/questions/55143818/coefficients-of-logistic-regression

How to convert a regular matrix to a sparse matrix in R?

老子叫甜甜 提交于 2019-12-11 18:15:09
问题 I have a 200K row x 27K column matrix and I'd like to convert it to a sparse matrix. I've tried doing this, but I get a segmentation fault: > dim(my_regular) [1] 196501 26791 > my_sparse <- as(my_regular, "sparseMatrix") *** caught segfault *** address 0x2b9e3e10e000, cause 'memory not mapped' Is there a better way? 回答1: This is definitely not ideal, but the only way I could get the conversion to happen was to break up the matrix in groups of 50K rows and then use rbind to combine them: my

efficient way to iterate through coo_matrix elements ordered by column?

时光总嘲笑我的痴心妄想 提交于 2019-12-11 17:58:15
问题 I have a scipy.sparse.coo_matrix matrix which I want to convert to bitsets per column for further calculation. (for the purpose of the example, I'm testing on 100Kx1M). I'm currently doing something like this: bitsets = [ intbitset() for _ in range(matrix.shape[1]) ] for i,j in itertools.izip(matrix.row, matrix.col): bitsets[j].add(i) That works, but COO matrix iterates the values by row. Ideally, I'd like to iterate by columns and then just build the bitset at once instead of adding to a

Normalizing sparse.csc_matrix by its diagonals

余生长醉 提交于 2019-12-11 17:44:38
问题 I have a scipy.sparse.csc_matrix with dtype = np.int32. I want to efficiently divide each column (or row, whichever faster for csc_matrix) of the matrix by the diagonal element in that column. So mnew[:,i] = m[:,i]/m[i,i] . Note that I need to convert my matrix to np.double (since mnew elements will be in [0,1]) and since the matrix is massive and very sparse I wonder if I can do it in some efficient/no for loop/never going dense way. Best, Ilya 回答1: Make a sparse matrix: In [379]: M = sparse

Multiply slice of scipy sparse matrix without changing sparsity

只谈情不闲聊 提交于 2019-12-11 17:13:52
问题 In scipy , when I multiply a slice of a sparse matrix with an array containing only zeros, the result is a matrix that is less or equally sparse than before, even though it should be more or equally sparse. The same holds for setting parts of the matrix to 0 or False: >>> import numpy as np >>> from scipy.sparse import csr_matrix as csr >>> M = csr(np.random.random((8,8))>0.9) >>> M <8x8 sparse matrix of type '<type 'numpy.bool_'>' with 6 stored elements in Compressed Sparse Row format> >>> M

Inserting null columns into a scipy sparse matrix in a specific order

做~自己de王妃 提交于 2019-12-11 15:44:03
问题 I have a sparse matrix with M rows and N columns, to which I want to concatenate K additional NULL columns so my objects will have now M rows and (N+K) columns. The tricky part is that I also have a list of indeces of length N, which can range from 0 to N+K, that indicate what is the position that every column should have in the new matrix. So for example, if N = 2, K = 1 and the list of indices is [2, 0], it means that I want to take the last column from my MxN matrix to be the first one,

How to optimize this process?

夙愿已清 提交于 2019-12-11 14:57:27
问题 I have somewhat of broad question, but I will try to make my intent as clear as possible so that people can make suggestions. I am trying to optimize a process I am doing. Generally, what I am doing is feeding a function a data frame of values and generating a prediction off of operations on specific columns. Basically a custom function that is being used with sapply (code below). What I'm doing is much to large to provide any meaningful example, so instead I will try to describe the inputs

Python Pandas: How to create a binary matrix from column of lists?

邮差的信 提交于 2019-12-11 12:36:05
问题 I have a Python Pandas DataFrame like the following: 1 0 a, b 1 c 2 d 3 e a, b is a string representing a list of user features How can I convert this into a binary matrix of the user features like the following: a b c d e 0 1 1 0 0 0 1 0 0 1 0 0 2 0 0 0 1 0 3 0 0 0 0 1 I saw a similar question Creating boolean matrix from one column with pandas but the column does not contain entries which are lists. I have tried these approaches, is there a way to merge the two: pd.get_dummies() pd.get