sparse-matrix | 易学教程

When I convert a matrix into “transactions” for use with the arules package all of my values become 0

阅读更多关于 When I convert a matrix into “transactions” for use with the arules package all of my values become 0

问题 I am trying to ao apply the apriori algorithm to a binary matrix, but all of my values are returning 0. I performed a summary function on the matrix to confirm that it has non-zero values. I tried coercing into the transactions form using: trans<-as(a,"transactions") and I tried applying apriori directly to the matrix using: test<-apriori(a,parameter=list(support=.02,confidence=0,minlen=3,maxlen=3)) in both cases I got the same result seen below. Anyone else experienced this? Thanks parameter

Can pandas SparseSeries store values in the float16 dtype?

阅读更多关于 Can pandas SparseSeries store values in the float16 dtype?

问题 The reason why I want to use a smaller data type in the sparse pandas containers is to reduce memory usage. This is relevant when working with data that originally uses bool (e.g. from to_dummies ) or small numeric dtypes (e.g. int8), which are all converted to float64 in sparse containers. DataFrame creation The provided example uses a modest 20k x 145 dataframe. In practice I'm working with dataframes in the order of 1e6 x 5e3. In []: bool_df.info() <class 'pandas.core.frame.DataFrame'>

Add categorical variable(gender) to Sparse Matrix for Multiclass Classification using sklearn

阅读更多关于 Add categorical variable(gender) to Sparse Matrix for Multiclass Classification using sklearn

问题 I am building a multiclass classification model using sklearn. I am converting my tweets into a 571x1815 sparse matrix of type with 34737 stored elements in Compressed Sparse Row format. I am trying to predict age groups based on history of tweets but I want to add an exogenous categorical variable (gender) to my sparse matrix and they use either Decision Tree or Random Forest to do my prediction. How do I add a vector to a sparse matrix? def vectorize(df): bow_transformer = CountVectorizer

How to create vector matrix of movie ratings using R project?

阅读更多关于 How to create vector matrix of movie ratings using R project?

问题 Suppose I am using this data set of movie ratings: http://www.grouplens.org/node/73 It contains ratings in a file formatted as userID::movieID::rating::timestamp Given this, I want to construct a feature matrix in R project, where each row corresponds to a user and each column indicates the rating that the user gave to the movie (if any). Example, if the data file contains 1::1::1::10 2::2::2::11 1::2::3::12 2::1::5::13 3::3::4::14 Then the output matrix would look like: UserID, Movie1,

sparse indexing in matlab

阅读更多关于 sparse indexing in matlab

问题 I have a very long code which is full of the following "if"s and matlab editor gives me a suggestion as follow: this sparse indexing expression is likely to be slow mt = rand(200,200); [c r] = size(mt); T = sparse(r*c,2); for i = 1:c for j = 1:r if(ind(j,i)==1) templat = template + 1; T((i-1)*r+j,2)=100000; end end; end; Is there any way by which I can make the code faster and do the matlab's suggestion? (The code may not run, because I just picked a few lines and tried to show the issue) 回答1

Solve Over-determined sparse matrix in Scipy (from Matlab to Python)

阅读更多关于 Solve Over-determined sparse matrix in Scipy (from Matlab to Python)

问题 Given a large sparse matrix A which are banded or tridiagonals (however it is called) and a vector f, I would like to solve for Z, where AZ = f. There are 6 diagonals, not clearly shown here. A has more M rows than N columns (just by 1, M ~= N), hence it is over-determined. Here is the source Matlab code, and I would like to convert it to its Scipy equivalent. Matlab A = A(:,2:end); #less one column f = f(:); Z = A\f; Z = [0;-Z]; Z = reshape(Z,H,W); Z = Z - min(Z(:)); My attempt on Scipy

group by on scipy sparse matrix

阅读更多关于 group by on scipy sparse matrix

问题 I have a scipy sparse matrix with 10e6 rows and 10e3 columns, populated to 1%. I also have an array of size 10e6 which contains keys corresponding to the 10e6 rows of my sparse matrix. I want to group my sparse matrix following these keys and aggregate with a sum function. Example: Keys: ['foo','bar','foo','baz','baz','bar'] Sparse matrix: (0,1) 3 -> corresponds to the first 'foo' key (0,10) 4 -> corresponds to the first 'bar' key (2,1) 1 -> corresponds to the second 'foo' key (1,3) 2 ->

Eigen and parallellization makes no difference for conjugate gradient. Precondition also fails

阅读更多关于 Eigen and parallellization makes no difference for conjugate gradient. Precondition also fails

问题 This is related to this question. I have today experimented a bit with Conjugate Gradient, in particular I experimented with max_iterations and tolerance . It is faster but not fast enough. According to the documentation it should be enough to add -fopenmp in the compilation to enable multi-threading . I have tested using both `omp_set_num_threads(nbrThreads); Eigen::setNbThreads(nbrThreads);` It makes no difference in time if I use 5 threads or 1 thread, and that I think is a bit strange.

Parallel assembly of a sparse matrix in python

阅读更多关于 Parallel assembly of a sparse matrix in python

问题 I'm trying to use mpi4py to assemble a very large sparse matrix in parallel. Each rank produces a sparse sub matrix (in scipy's dok format) that needs to be put in place in the very large matrix. So far I have succeeded if each rank produces a numpy array containing the indices and the values of the nonzero values (mimicking the coo format). After the gather procedure I can assemble the large matrix from the numpy arrays. The final matrix is to be written to disk as an mtx format file. What

space allocated by compressed_matrix in boost

阅读更多关于 space allocated by compressed_matrix in boost

问题 How much space is allocated by boost compressed_matrix? Is it true that it only allocates space for non-zero elements? If this is true, I don't understand why the following code gives bad_alloc error. namespace bubla = boost::numeric::ublas; typedef double value_type; typedef bubla::compressed_matrix<value_type> SparseMatrix; unsigned int m = 10000*10000; SparseMatrix D(m,m,3*m), X; It should only allocate space for 3*m=3*10000*10000 elements right? Could you please help clarify? What data