sparse-matrix | 易学教程

java sparse matrix problem

阅读更多关于 java sparse matrix problem

问题 I have two dimensional matrix. My matrix is sparse. I am facing performance problem. Can any body please answer that what api or class i can use in java to handle sparse matrix to improve my program performance. For example I want it take 100x100 matrix handle sparse stuff do the multiplication return me my matrix same as 100x100 with 0 ( mean sparse matrix ) 回答1: Jama is awful for large sparse matrices. Have a look at the Colt linear algebra library. Another possibility for sparse linear

Fastest way to sum over rows of sparse matrix

阅读更多关于 Fastest way to sum over rows of sparse matrix

问题 I have a big csr_matrix(1M*1K) and I want to add over rows and obtain a new csr_matrix with the same number of columns but reduced number of rows. Actually my problem is exactly same as this Sum over rows in scipy.sparse.csr_matrix. The only thing is I find the accepted solution to be slow for my purpose. Let me state what I have map_fn = np.random.randint(0, 10000, 1000000) map_fn here tells me how my input rows(1M) are mapped into my output rows(10K). For example ith input row gets added up

Elementwise division of sparse matrices, ignoring 0/0

阅读更多关于 Elementwise division of sparse matrices, ignoring 0/0

问题 I have two sparse matrices E and D, which have non-zero entries at the same places. Now I want to have E/D as a sparse matrix, defined only where D is non-zero. For example take the following code: import numpy as np import scipy E_full = np.matrix([[1.4536000e-02, 0.0000000e+00, 0.0000000e+00, 1.7914321e+00, 2.6854320e-01, 4.1742600e-01, 0.0000000e+00], [9.8659000e-03, 0.0000000e+00, 0.0000000e+00, 1.9106752e+00, 5.7283640e-01, 1.4840370e-01, 0.0000000e+00], [1.3920000e-04, 0.0000000e+00, 0

Convert CountVectorizer and TfidfTransformer Sparse Matrices into Separate Pandas Dataframe Rows

阅读更多关于 Convert CountVectorizer and TfidfTransformer Sparse Matrices into Separate Pandas Dataframe Rows

问题 Question: What is the best way to convert sparse matrices resulting from sklearn's CountVectorizer and TfidfTransformer into Pandas DataFrame columns with a separate row for each bigram and its corresponding frequency and tf-idf score? Pipeline: Bring in text data from a SQL DB, split text into bigrams and calculate the frequency per document and the tf-idf per bigram per document, load the results back into the SQL DB. Current State: Two columns of data are brought in ( number , text ). text

Converting coefficient names to a formula in R

阅读更多关于 Converting coefficient names to a formula in R

问题 When using formulas that have factors, the fitted models name the coefficients XY, where X is the name of the factor and Y is a particular level of it. I want to be able to create a formula from the names of these coefficients. The reason: If I fit a lasso to a sparse design matrix (as I do below) I would like to create a new formula object that only contains terms for the nonzero coefficients. require("MatrixModels") require("glmnet") set.seed(1) n <- 200 Z <- data.frame(letter=factor(sample

R: removal of regex from Quanteda DFM, Sparse Document-Feature Matrix, object?

阅读更多关于 R: removal of regex from Quanteda DFM, Sparse Document-Feature Matrix, object?

问题 Quanteda package provides the sparse document-feature matrix DFM and its methods contain removeFeatures. I have tried dfm(x, removeFeatures="\\b[a-z]{1-3}\\b") to remove too short words as well as dfm(x, keptFeatures="\\b[a-z]{4-99}\\b") to preserve sufficiently long words but not working, basically doing the same thing i.e. removing too short words. How can I remove a regex match from a Quanteda DFM object? Example. myMatrix <-dfm(myData, ignoredFeatures = stopwords("english"), stem = TRUE,

kNN with big sparse matrices in Python

阅读更多关于 kNN with big sparse matrices in Python

问题 I have two large sparse matrices: In [3]: trainX Out[3]: <6034195x755258 sparse matrix of type '<type 'numpy.float64'>' with 286674296 stored elements in Compressed Sparse Row format> In [4]: testX Out[4]: <2013337x755258 sparse matrix of type '<type 'numpy.float64'>' with 95423596 stored elements in Compressed Sparse Row format> About 5 GB RAM in total to load. Note these matrices are HIGHLY sparse (0.0062% occupied). For each row in testX , I want to find the Nearest Neighbor in trainX and

kNN with big sparse matrices in Python

阅读更多关于 kNN with big sparse matrices in Python

Sorting in Sparse Matrix

阅读更多关于 Sorting in Sparse Matrix

问题 I have a sparse matrix. I need to sort this matrix row-by-row and create another [sparse] matrix. Code may explain it better: # for `rand` function, you need newer version of scipy. from scipy.sparse import * m = rand(6,6, density=0.6) d = m.getrow(0) print d Output1 (0, 5) 0.874881629788 (0, 4) 0.352559852239 (0, 2) 0.504791645463 (0, 1) 0.885898140175 I have this m matrix. I want to create a new matrix with sorted version of m. The new matrix contains 0'th row like this. new_d = new_m

Sorting in Sparse Matrix

阅读更多关于 Sorting in Sparse Matrix