Python: Gensim Memory Error

大城市里の小女人 提交于 2021-01-29 01:37:04

问题


import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
from gensim import corpora, models, similarities
from nltk.corpus import stopwords
import codecs

documents = []
with codecs.open("Master_File_for_Docs.txt", encoding = 'utf-8', mode= "r") as fid:
   for line in fid:
       documents.append(line)
stoplist = []
x = stopwords.words('english')
for word in x:
    stoplist.append(word)

#Removes Stopwords
texts = [[word for word in document.lower().split() if word not in stoplist]
for document in documents]


dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]

lda = models.LdaModel(corpus, id2word=dictionary, num_topics=100)
lda.print_topics(20)
#corpus_lda = lda[corpus]
#for doc in corpus_lda:
 #   print(doc)

I am running Gensim for topic modeling and trying to get the above code working. I know that this code works because my friend ran it from a mac computer and it worked successfully but when I run it from a windows computer the code gives me a

MemoryError

Also the logging that I set on the second line also doesn't appear on my windows computer. Is there something in Windows that I need to fix in order for gensim to work?


回答1:


I have installed gensim in my windows computer successfully,but it also appears memoryError, when I set the topic numbers larger for big data. because the space complexity of gensim is O(K*V) where the K is topics numbers and V is the size of the dictionary, it depends on your computer RAM. so you can set the topic numbers to 50 or less than 100, which can solve it. maybe firstly you should test the example on the genism official website:http://radimrehurek.com/gensim/index.html




回答2:


The MemoryError appears because Gensim is trying to keep all of the data you need in memory while analyzing it. The solutions are scarse:

  • Use a server with more memory (AWS machine, anything more powerful than your PC)
  • Try a python interpreter in 64 bit
  • Try reducing the size parameter in model.save(). This will lead to have less features representing your words
  • Try increasing the min_count parameter in model.save(). This will make the model consider only words that appear at least min_count times

Be careful though, these last 2 advices will modify the characteristics of your model



来源:https://stackoverflow.com/questions/32543235/python-gensim-memory-error

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!