sparse-matrix

java sparse matrix problem

血红的双手。 提交于 2019-12-22 08:38:48
问题 I have two dimensional matrix. My matrix is sparse. I am facing performance problem. Can any body please answer that what api or class i can use in java to handle sparse matrix to improve my program performance. For example I want it take 100x100 matrix handle sparse stuff do the multiplication return me my matrix same as 100x100 with 0 ( mean sparse matrix ) 回答1: Jama is awful for large sparse matrices. Have a look at the Colt linear algebra library. Another possibility for sparse linear

Fastest way to sum over rows of sparse matrix

强颜欢笑 提交于 2019-12-22 08:26:35
问题 I have a big csr_matrix(1M*1K) and I want to add over rows and obtain a new csr_matrix with the same number of columns but reduced number of rows. Actually my problem is exactly same as this Sum over rows in scipy.sparse.csr_matrix. The only thing is I find the accepted solution to be slow for my purpose. Let me state what I have map_fn = np.random.randint(0, 10000, 1000000) map_fn here tells me how my input rows(1M) are mapped into my output rows(10K). For example ith input row gets added up

Elementwise division of sparse matrices, ignoring 0/0

╄→尐↘猪︶ㄣ 提交于 2019-12-22 06:36:00
问题 I have two sparse matrices E and D, which have non-zero entries at the same places. Now I want to have E/D as a sparse matrix, defined only where D is non-zero. For example take the following code: import numpy as np import scipy E_full = np.matrix([[1.4536000e-02, 0.0000000e+00, 0.0000000e+00, 1.7914321e+00, 2.6854320e-01, 4.1742600e-01, 0.0000000e+00], [9.8659000e-03, 0.0000000e+00, 0.0000000e+00, 1.9106752e+00, 5.7283640e-01, 1.4840370e-01, 0.0000000e+00], [1.3920000e-04, 0.0000000e+00, 0

Convert CountVectorizer and TfidfTransformer Sparse Matrices into Separate Pandas Dataframe Rows

折月煮酒 提交于 2019-12-22 05:15:08
问题 Question: What is the best way to convert sparse matrices resulting from sklearn's CountVectorizer and TfidfTransformer into Pandas DataFrame columns with a separate row for each bigram and its corresponding frequency and tf-idf score? Pipeline: Bring in text data from a SQL DB, split text into bigrams and calculate the frequency per document and the tf-idf per bigram per document, load the results back into the SQL DB. Current State: Two columns of data are brought in ( number , text ). text

Converting coefficient names to a formula in R

我只是一个虾纸丫 提交于 2019-12-22 04:29:12
问题 When using formulas that have factors, the fitted models name the coefficients XY, where X is the name of the factor and Y is a particular level of it. I want to be able to create a formula from the names of these coefficients. The reason: If I fit a lasso to a sparse design matrix (as I do below) I would like to create a new formula object that only contains terms for the nonzero coefficients. require("MatrixModels") require("glmnet") set.seed(1) n <- 200 Z <- data.frame(letter=factor(sample

R: removal of regex from Quanteda DFM, Sparse Document-Feature Matrix, object?

天涯浪子 提交于 2019-12-21 17:40:06
问题 Quanteda package provides the sparse document-feature matrix DFM and its methods contain removeFeatures. I have tried dfm(x, removeFeatures="\\b[a-z]{1-3}\\b") to remove too short words as well as dfm(x, keptFeatures="\\b[a-z]{4-99}\\b") to preserve sufficiently long words but not working, basically doing the same thing i.e. removing too short words. How can I remove a regex match from a Quanteda DFM object? Example. myMatrix <-dfm(myData, ignoredFeatures = stopwords("english"), stem = TRUE,

kNN with big sparse matrices in Python

给你一囗甜甜゛ 提交于 2019-12-21 13:42:09
问题 I have two large sparse matrices: In [3]: trainX Out[3]: <6034195x755258 sparse matrix of type '<type 'numpy.float64'>' with 286674296 stored elements in Compressed Sparse Row format> In [4]: testX Out[4]: <2013337x755258 sparse matrix of type '<type 'numpy.float64'>' with 95423596 stored elements in Compressed Sparse Row format> About 5 GB RAM in total to load. Note these matrices are HIGHLY sparse (0.0062% occupied). For each row in testX , I want to find the Nearest Neighbor in trainX and

kNN with big sparse matrices in Python

偶尔善良 提交于 2019-12-21 13:41:15
问题 I have two large sparse matrices: In [3]: trainX Out[3]: <6034195x755258 sparse matrix of type '<type 'numpy.float64'>' with 286674296 stored elements in Compressed Sparse Row format> In [4]: testX Out[4]: <2013337x755258 sparse matrix of type '<type 'numpy.float64'>' with 95423596 stored elements in Compressed Sparse Row format> About 5 GB RAM in total to load. Note these matrices are HIGHLY sparse (0.0062% occupied). For each row in testX , I want to find the Nearest Neighbor in trainX and

Sorting in Sparse Matrix

天大地大妈咪最大 提交于 2019-12-21 09:34:42
问题 I have a sparse matrix. I need to sort this matrix row-by-row and create another [sparse] matrix. Code may explain it better: # for `rand` function, you need newer version of scipy. from scipy.sparse import * m = rand(6,6, density=0.6) d = m.getrow(0) print d Output1 (0, 5) 0.874881629788 (0, 4) 0.352559852239 (0, 2) 0.504791645463 (0, 1) 0.885898140175 I have this m matrix. I want to create a new matrix with sorted version of m. The new matrix contains 0'th row like this. new_d = new_m

Sorting in Sparse Matrix

女生的网名这么多〃 提交于 2019-12-21 09:34:26
问题 I have a sparse matrix. I need to sort this matrix row-by-row and create another [sparse] matrix. Code may explain it better: # for `rand` function, you need newer version of scipy. from scipy.sparse import * m = rand(6,6, density=0.6) d = m.getrow(0) print d Output1 (0, 5) 0.874881629788 (0, 4) 0.352559852239 (0, 2) 0.504791645463 (0, 1) 0.885898140175 I have this m matrix. I want to create a new matrix with sorted version of m. The new matrix contains 0'th row like this. new_d = new_m