Efficient transformation of gensim TransformedCorpus data to array

懵懂的女人 提交于 2021-02-07 10:13:30

问题


Is there a more direct or efficient method for getting the topic probabilities data from a gensim.interfaces.TransformedCorpus object into a numpy array (or alternatively, pandas dataframe) than the by-row method below?

from gensim import models
import numpy as np

num_topics = 5
model = models.LdaMulticore(corpus, num_topics=num_topics, minimum_probability=0.0)

all_topics = model.get_document_topics(corpus)
num_docs = len(all_topics)

lda_scores = np.empty([num_docs, num_topics])

for i in range(0, num_docs):
    lda_scores[i] = np.array(all_topics[i]).transpose()[1]

回答1:


Might be too late, but gensim has a helper function for converting to and from numpy/scipy arrays.

What you're looking for:

gensim.matutils.corpus2csc

You can then can convert the output to a numpy array or pandas df as you wish.

import gensim
import numpy as np

all_topics_csr = gensim.matutils.corpus2csc(all_topics)
all_topics_numpy = all_topics_csr.T.toarray()


来源:https://stackoverflow.com/questions/48358161/efficient-transformation-of-gensim-transformedcorpus-data-to-array

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!