(Python Scipy) How to flatten a csr_matrix and append it to another csr_matrix?

孤者浪人 提交于 2019-12-08 02:08:10

问题


I am representing each XML document as a feature matrix in a csr_matrix format. Now that I have around 3000 XML documents, I got a list of csr_matrices. I want to flatten each of these matrices to become feature vectors, then I want to combine all of these feature vectors to form one csr_matrix representing all the XML documents as one, where each row is a document and each column is a feature.

One way to achieve this is through this code

X= csr_matrix([a.toarray().ravel().tolist() for a in ls])

where ls is the list of csr_matrices, however, this is highly inefficient, as with 3000 documents, this simply crashes!

In other words, my question is, how to flatten each csr_matrix in that list 'ls' without having to turn it into an array, and how to append the flattened csr_matrices into another csr_matrix.

Please note that I am using python with Scipy

Thanks in advance!


回答1:


Why you use csr_matrix for each XML, maybe it's better to use lil, lil_matrix support reshape method, here is an example:

N, M, K = 100, 200, 300
matrixs = [sparse.rand(N, M, format="csr") for i in xrange(K)]
matrixs2 = [m.tolil().reshape((1, N*M)) for m in matrixs]
m1 = sparse.vstack(matrixs2).tocsr()

# test with dense array
#m2 = np.vstack([m.toarray().reshape(-1) for m in matrixs])
#np.allclose(m1.toarray(), m2)


来源:https://stackoverflow.com/questions/15563396/python-scipy-how-to-flatten-a-csr-matrix-and-append-it-to-another-csr-matrix

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!