sparse-matrix | 易学教程

sklearn tsne with sparse matrix

阅读更多关于 sklearn tsne with sparse matrix

问题 I'm trying to display tsne on a very sparse matrix with precomputed distances values but I'm having trouble with it. It boils down to this: row = np.array([0, 2, 2, 0, 1, 2]) col = np.array([0, 0, 1, 2, 2, 2]) distances = np.array([.1, .2, .3, .4, .5, .6]) X = csc_matrix((distances, (row, col)), shape=(3, 3)) Y = TSNE(metric='precomputed').fit_transform(X) However, I get this error: TypeError: A sparse matrix was passed, but dense data is required for method="barnes_hut". Use X.toarray() to

How to access CoordinateMatrix entries directly in Spark?

阅读更多关于 How to access CoordinateMatrix entries directly in Spark?

问题 I want to store a big sparse matrix using Spark, so I tried to use CoordinateMatrix , since it is a distributed matrix. However, I have not found a way to access each entry directly such as this way: apply(int x, int y) I only found the functions like: public RDD<MatrixEntry> entries() In this case, I have to loop over the entries to find out the one I want, which is not efficient way. Has anyone used CoordinateMatrix before ? What should I do to get each entry from CoordinateMatrix

Random binary matrix with two non-trivial constraints

阅读更多关于 Random binary matrix with two non-trivial constraints

问题 I need to generate a random matrix of K columns and N rows containing ones and zeroes, such that: a) Each row contains exactly k ones. b) Each row is different from the other (combinatorics imposes that if N > nchoosek(K, k) there will be nchoosek(K,k) rows). Assume I want N = 10000 (out of all the possible nchoosek(K, k) = 27405 combinations), different 1×K vectors (with K = 30 ) containing k (with k = 4 ) ones and K - k zeroes. This code: clear all; close N=10000; K=30; k=4; M=randi([0 1],N

How to check if the block is present in a sparse file (for simple copy-on-write)?

阅读更多关于 How to check if the block is present in a sparse file (for simple copy-on-write)?

问题 How to get sparse block size and check if data is present at the given offset in sparse file in reiserfs/ext3 in Linux? I want to use it to implement simple copy-on-write block device using FUSE. Or I should better keep a bitmap in a separate file? 回答1: /usr/src/linux/Documentation/filesystems/fiemap.txt The fiemap ioctl is an efficient method for userspace to get file extent mappings. Instead of block-by-block mapping (such as bmap), fiemap returns a list of extents. There's a quick example

Spark Cosine Similarity (DIMSUM algorithm ) sparse input file

阅读更多关于 Spark Cosine Similarity (DIMSUM algorithm ) sparse input file

问题 I was wondering whether it would be possible for Spark Cosine Similarity to work with Sparse input data? I have seen examples wherein the input consists of lines of space-separated features of the form: id feat1 feat2 feat3 ... but I have an inherently sparse, implicit feedback setting and would like to have input in the form: id1 feat1:1 feat5:1 feat10:1 id2 feat3:1 feat5:1 .. ... I would like to make use of the sparsity to improve the calculation. Also ultimately I wish to use the DIMSUM

Why does the speed of this SOR solver depend on the input?

阅读更多关于 Why does the speed of this SOR solver depend on the input?

问题 Related to my other question, I have now modified the sparse matrix solver to use the SOR (Successive Over-Relaxation) method. The code is now as follows: void SORSolver::step() { float const omega = 1.0f; float const *b = &d_b(1, 1), *w = &d_w(1, 1), *e = &d_e(1, 1), *s = &d_s(1, 1), *n = &d_n(1, 1), *xw = &d_x(0, 1), *xe = &d_x(2, 1), *xs = &d_x(1, 0), *xn = &d_x(1, 2); float *xc = &d_x(1, 1); for (size_t y = 1; y < d_ny - 1; ++y) { for (size_t x = 1; x < d_nx - 1; ++x) { float diff = *b -

Why does the speed of this SOR solver depend on the input?

阅读更多关于 Why does the speed of this SOR solver depend on the input?

Sparse Multi-Dimensional Data Representation

阅读更多关于 Sparse Multi-Dimensional Data Representation

问题 I'm working on a cardiac simulation tool that uses 4-dimensional data, i.e. several (3-30) variables at locations in 3D space. I'm now adding some tissue geometry which will leave over 2/3 of the points in the containing 3D box outside of the tissue to be simulated, so I need a way to efficiently store the active points and not the others. Crucially, I need to be able to: Iterate over all of the active points within a constrained 3D box (iterator, perhaps?) Having accessed one point, find its

How Tensorflow handles categorical features with multiple inputs within one column?

阅读更多关于 How Tensorflow handles categorical features with multiple inputs within one column?

问题 For example, I have a data in the following csv format: csv col0 col1 col2 col3 1 A E|A|C 3 0 B D|F 2 2 C | 2 Each column seperated by comma represent one feature. Normally, a feature is one-hot(e.g. col0, col1, col3 ), but in this case, the feature for col2 has multiple inputs(seperated by |). I'm sure tensorflow can handle one-hot feature with sparse tensor, but I'm not sure whether it could handle features with multiple inputs like col2 ? How should it be represented in Tensorflow's sparse

Very slow performance of cusparse csrsv_analysis

阅读更多关于 Very slow performance of cusparse csrsv_analysis

问题 I wrote a Conjugate-gradient solver (for linear system of equations) with LU preconditioning, I used Dr. Maxim Naumov's papers on nvidia's research community as a guideline, the residuals update step, which requires solving a lower triangular matrix system and then solving an upper triangular matrix system is divided into two phases: analysis phase (which exploits the sparsity pattern and decides the parallelization level). the solution phase itself. according to all posts related to this