sparse-matrix

When I convert a matrix into “transactions” for use with the arules package all of my values become 0

笑着哭i 提交于 2019-12-12 03:59:48
问题 I am trying to ao apply the apriori algorithm to a binary matrix, but all of my values are returning 0. I performed a summary function on the matrix to confirm that it has non-zero values. I tried coercing into the transactions form using: trans<-as(a,"transactions") and I tried applying apriori directly to the matrix using: test<-apriori(a,parameter=list(support=.02,confidence=0,minlen=3,maxlen=3)) in both cases I got the same result seen below. Anyone else experienced this? Thanks parameter

Can pandas SparseSeries store values in the float16 dtype?

两盒软妹~` 提交于 2019-12-12 03:25:46
问题 The reason why I want to use a smaller data type in the sparse pandas containers is to reduce memory usage. This is relevant when working with data that originally uses bool (e.g. from to_dummies ) or small numeric dtypes (e.g. int8), which are all converted to float64 in sparse containers. DataFrame creation The provided example uses a modest 20k x 145 dataframe. In practice I'm working with dataframes in the order of 1e6 x 5e3. In []: bool_df.info() <class 'pandas.core.frame.DataFrame'>

Add categorical variable(gender) to Sparse Matrix for Multiclass Classification using sklearn

南楼画角 提交于 2019-12-12 03:15:02
问题 I am building a multiclass classification model using sklearn. I am converting my tweets into a 571x1815 sparse matrix of type with 34737 stored elements in Compressed Sparse Row format. I am trying to predict age groups based on history of tweets but I want to add an exogenous categorical variable (gender) to my sparse matrix and they use either Decision Tree or Random Forest to do my prediction. How do I add a vector to a sparse matrix? def vectorize(df): bow_transformer = CountVectorizer

How to create vector matrix of movie ratings using R project?

心已入冬 提交于 2019-12-12 03:04:36
问题 Suppose I am using this data set of movie ratings: http://www.grouplens.org/node/73 It contains ratings in a file formatted as userID::movieID::rating::timestamp Given this, I want to construct a feature matrix in R project, where each row corresponds to a user and each column indicates the rating that the user gave to the movie (if any). Example, if the data file contains 1::1::1::10 2::2::2::11 1::2::3::12 2::1::5::13 3::3::4::14 Then the output matrix would look like: UserID, Movie1,

sparse indexing in matlab

南笙酒味 提交于 2019-12-12 02:49:54
问题 I have a very long code which is full of the following "if"s and matlab editor gives me a suggestion as follow: this sparse indexing expression is likely to be slow mt = rand(200,200); [c r] = size(mt); T = sparse(r*c,2); for i = 1:c for j = 1:r if(ind(j,i)==1) templat = template + 1; T((i-1)*r+j,2)=100000; end end; end; Is there any way by which I can make the code faster and do the matlab's suggestion? (The code may not run, because I just picked a few lines and tried to show the issue) 回答1

Solve Over-determined sparse matrix in Scipy (from Matlab to Python)

99封情书 提交于 2019-12-12 02:46:28
问题 Given a large sparse matrix A which are banded or tridiagonals (however it is called) and a vector f, I would like to solve for Z, where AZ = f. There are 6 diagonals, not clearly shown here. A has more M rows than N columns (just by 1, M ~= N), hence it is over-determined. Here is the source Matlab code, and I would like to convert it to its Scipy equivalent. Matlab A = A(:,2:end); #less one column f = f(:); Z = A\f; Z = [0;-Z]; Z = reshape(Z,H,W); Z = Z - min(Z(:)); My attempt on Scipy

group by on scipy sparse matrix

依然范特西╮ 提交于 2019-12-12 02:45:10
问题 I have a scipy sparse matrix with 10e6 rows and 10e3 columns, populated to 1%. I also have an array of size 10e6 which contains keys corresponding to the 10e6 rows of my sparse matrix. I want to group my sparse matrix following these keys and aggregate with a sum function. Example: Keys: ['foo','bar','foo','baz','baz','bar'] Sparse matrix: (0,1) 3 -> corresponds to the first 'foo' key (0,10) 4 -> corresponds to the first 'bar' key (2,1) 1 -> corresponds to the second 'foo' key (1,3) 2 ->

Eigen and parallellization makes no difference for conjugate gradient. Precondition also fails

故事扮演 提交于 2019-12-12 02:33:36
问题 This is related to this question. I have today experimented a bit with Conjugate Gradient, in particular I experimented with max_iterations and tolerance . It is faster but not fast enough. According to the documentation it should be enough to add -fopenmp in the compilation to enable multi-threading . I have tested using both `omp_set_num_threads(nbrThreads); Eigen::setNbThreads(nbrThreads);` It makes no difference in time if I use 5 threads or 1 thread, and that I think is a bit strange.

Parallel assembly of a sparse matrix in python

邮差的信 提交于 2019-12-12 02:08:43
问题 I'm trying to use mpi4py to assemble a very large sparse matrix in parallel. Each rank produces a sparse sub matrix (in scipy's dok format) that needs to be put in place in the very large matrix. So far I have succeeded if each rank produces a numpy array containing the indices and the values of the nonzero values (mimicking the coo format). After the gather procedure I can assemble the large matrix from the numpy arrays. The final matrix is to be written to disk as an mtx format file. What

space allocated by compressed_matrix in boost

放肆的年华 提交于 2019-12-12 01:34:45
问题 How much space is allocated by boost compressed_matrix? Is it true that it only allocates space for non-zero elements? If this is true, I don't understand why the following code gives bad_alloc error. namespace bubla = boost::numeric::ublas; typedef double value_type; typedef bubla::compressed_matrix<value_type> SparseMatrix; unsigned int m = 10000*10000; SparseMatrix D(m,m,3*m), X; It should only allocate space for 3*m=3*10000*10000 elements right? Could you please help clarify? What data