Converting Pandas DataFrame to sparse matrix

吃可爱长大的小学妹 提交于 2021-02-08 02:15:43

问题


Here is my code:

data=pd.get_dummies(data['movie_id']).groupby(data['user_id']).apply(max)

df=pd.DataFrame(data)

replace=df.replace(0,np.NaN)

t=replace.fillna(-1)

sparse=sp.csr_matrix(t.values)

My data consist of two columns which are movie_id and user_id.

 user_id      movie_id

   5             1000 

   6             1007 

I want to convert the data to a sparse matrix. I first created an interaction matrix where rows indicate user_id and columns indicate movie_id with positive interaction as +1 and negative interaction as -1. Then I converted it to a sparse matrix using scipy. My result looks like this:

(0,0) -1

(0,1) -1

(0,2) 1

but what actually i want is this:

(1000,0) -1

(1000,1) 1

(1007,0) -1

Any help would be appreciated.


回答1:


If you have both the row and column index (in your case movie_id and user_id, respectively), it is advisable to use the COO format for creation.

You can convert it into a sparse format like so:

import scipy
sparse_mat = scipy.sparse.coo_matrix((t.values, (df.movie_id, df.user_id)))

Importantly, note how the constructor gives the implicit shape of the sparse matrix by passing both the movie ID and user ID as arguments for the data.
Furthermore, you can convert this matrix to any other sparse format you desire, as for example CSR.



来源:https://stackoverflow.com/questions/51240096/converting-pandas-dataframe-to-sparse-matrix

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!