How do you initialize a gensim corpus variable with a csr_matrix?

时间秒杀一切 提交于 2019-11-29 09:40:26

问题


I have X as a csr_matrix that I obtained using scikit's tfidf vectorizer, and y which is an array

My plan is to create features using LDA, however, I failed to find how to initialize a gensim's corpus variable with X as a csr_matrix. In other words, I don't want to download a corpus as shown in gensim's documentation nor convert X to a dense matrix, since it would consume a lot of memory and the computer could hang.

In short, my questions are the following,

  1. How do you initialize a gensim corpus given that I have a csr_matrix (sparse) representing the whole corpus?
  2. How do you use LDA to extract features?

回答1:


Gensim has a semi-well-hidden function that can kind of do this for you:

http://radimrehurek.com/gensim/matutils.html#gensim.matutils.Sparse2Corpus

"class gensim.matutils.Sparse2Corpus(sparse, documents_columns=True) Convert a matrix in scipy.sparse format into a streaming gensim corpus."

I've had some success with it using a corpus extracted with CountVectorizer, then loaded into gensim.



来源:https://stackoverflow.com/questions/15670525/how-do-you-initialize-a-gensim-corpus-variable-with-a-csr-matrix

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!