sparse-matrix

sklearn tsne with sparse matrix

谁说我不能喝 提交于 2020-01-17 07:06:06
问题 I'm trying to display tsne on a very sparse matrix with precomputed distances values but I'm having trouble with it. It boils down to this: row = np.array([0, 2, 2, 0, 1, 2]) col = np.array([0, 0, 1, 2, 2, 2]) distances = np.array([.1, .2, .3, .4, .5, .6]) X = csc_matrix((distances, (row, col)), shape=(3, 3)) Y = TSNE(metric='precomputed').fit_transform(X) However, I get this error: TypeError: A sparse matrix was passed, but dense data is required for method="barnes_hut". Use X.toarray() to

How to access CoordinateMatrix entries directly in Spark?

不打扰是莪最后的温柔 提交于 2020-01-16 01:01:47
问题 I want to store a big sparse matrix using Spark, so I tried to use CoordinateMatrix , since it is a distributed matrix. However, I have not found a way to access each entry directly such as this way: apply(int x, int y) I only found the functions like: public RDD<MatrixEntry> entries() In this case, I have to loop over the entries to find out the one I want, which is not efficient way. Has anyone used CoordinateMatrix before ? What should I do to get each entry from CoordinateMatrix

Random binary matrix with two non-trivial constraints

|▌冷眼眸甩不掉的悲伤 提交于 2020-01-15 05:57:28
问题 I need to generate a random matrix of K columns and N rows containing ones and zeroes, such that: a) Each row contains exactly k ones. b) Each row is different from the other (combinatorics imposes that if N > nchoosek(K, k) there will be nchoosek(K,k) rows). Assume I want N = 10000 (out of all the possible nchoosek(K, k) = 27405 combinations), different 1×K vectors (with K = 30 ) containing k (with k = 4 ) ones and K - k zeroes. This code: clear all; close N=10000; K=30; k=4; M=randi([0 1],N

How to check if the block is present in a sparse file (for simple copy-on-write)?

孤者浪人 提交于 2020-01-14 08:47:08
问题 How to get sparse block size and check if data is present at the given offset in sparse file in reiserfs/ext3 in Linux? I want to use it to implement simple copy-on-write block device using FUSE. Or I should better keep a bitmap in a separate file? 回答1: /usr/src/linux/Documentation/filesystems/fiemap.txt The fiemap ioctl is an efficient method for userspace to get file extent mappings. Instead of block-by-block mapping (such as bmap), fiemap returns a list of extents. There's a quick example

Spark Cosine Similarity (DIMSUM algorithm ) sparse input file

柔情痞子 提交于 2020-01-13 03:55:10
问题 I was wondering whether it would be possible for Spark Cosine Similarity to work with Sparse input data? I have seen examples wherein the input consists of lines of space-separated features of the form: id feat1 feat2 feat3 ... but I have an inherently sparse, implicit feedback setting and would like to have input in the form: id1 feat1:1 feat5:1 feat10:1 id2 feat3:1 feat5:1 .. ... I would like to make use of the sparsity to improve the calculation. Also ultimately I wish to use the DIMSUM

Why does the speed of this SOR solver depend on the input?

久未见 提交于 2020-01-12 08:19:43
问题 Related to my other question, I have now modified the sparse matrix solver to use the SOR (Successive Over-Relaxation) method. The code is now as follows: void SORSolver::step() { float const omega = 1.0f; float const *b = &d_b(1, 1), *w = &d_w(1, 1), *e = &d_e(1, 1), *s = &d_s(1, 1), *n = &d_n(1, 1), *xw = &d_x(0, 1), *xe = &d_x(2, 1), *xs = &d_x(1, 0), *xn = &d_x(1, 2); float *xc = &d_x(1, 1); for (size_t y = 1; y < d_ny - 1; ++y) { for (size_t x = 1; x < d_nx - 1; ++x) { float diff = *b -

Why does the speed of this SOR solver depend on the input?

☆樱花仙子☆ 提交于 2020-01-12 08:16:35
问题 Related to my other question, I have now modified the sparse matrix solver to use the SOR (Successive Over-Relaxation) method. The code is now as follows: void SORSolver::step() { float const omega = 1.0f; float const *b = &d_b(1, 1), *w = &d_w(1, 1), *e = &d_e(1, 1), *s = &d_s(1, 1), *n = &d_n(1, 1), *xw = &d_x(0, 1), *xe = &d_x(2, 1), *xs = &d_x(1, 0), *xn = &d_x(1, 2); float *xc = &d_x(1, 1); for (size_t y = 1; y < d_ny - 1; ++y) { for (size_t x = 1; x < d_nx - 1; ++x) { float diff = *b -

Sparse Multi-Dimensional Data Representation

天涯浪子 提交于 2020-01-11 10:13:26
问题 I'm working on a cardiac simulation tool that uses 4-dimensional data, i.e. several (3-30) variables at locations in 3D space. I'm now adding some tissue geometry which will leave over 2/3 of the points in the containing 3D box outside of the tissue to be simulated, so I need a way to efficiently store the active points and not the others. Crucially, I need to be able to: Iterate over all of the active points within a constrained 3D box (iterator, perhaps?) Having accessed one point, find its

How Tensorflow handles categorical features with multiple inputs within one column?

我只是一个虾纸丫 提交于 2020-01-11 04:06:07
问题 For example, I have a data in the following csv format: csv col0 col1 col2 col3 1 A E|A|C 3 0 B D|F 2 2 C | 2 Each column seperated by comma represent one feature. Normally, a feature is one-hot(e.g. col0, col1, col3 ), but in this case, the feature for col2 has multiple inputs(seperated by |). I'm sure tensorflow can handle one-hot feature with sparse tensor, but I'm not sure whether it could handle features with multiple inputs like col2 ? How should it be represented in Tensorflow's sparse

Very slow performance of cusparse csrsv_analysis

半腔热情 提交于 2020-01-11 03:34:06
问题 I wrote a Conjugate-gradient solver (for linear system of equations) with LU preconditioning, I used Dr. Maxim Naumov's papers on nvidia's research community as a guideline, the residuals update step, which requires solving a lower triangular matrix system and then solving an upper triangular matrix system is divided into two phases: analysis phase (which exploits the sparsity pattern and decides the parallelization level). the solution phase itself. according to all posts related to this