Load gensim Word2Vec computed in Python 2, in Python 3

匿名 (未验证) 提交于 2019-12-03 07:50:05

问题:

I have a gensim Word2Vec model computed in Python 2 like that:

from gensim.models import Word2Vec from gensim.models.word2vec import LineSentence  model = Word2Vec(LineSentence('enwiki.txt'), size=100,                   window=5, min_count=5, workers=15) model.save('w2v.model') 

However, I need to use it in Python 3. If I try to load it,

import gensim from gensim.models import Word2Vec model = Word2Vec.load('w2v.model') 

it results in an error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xf9 in position 0: ordinal not in range(128) 

I suppose the problem is in differences in encoding between Python2 and Python3. Also it seems like gensim is using pickle to save/load models.

Is there a way to set encoding/pickle options so that the model loads properly? Or maybe use some external tool to convert the model file?

Recomputing it in Python 3 is not an option: it takes way too much time.

回答1:

This indeed looks like a bug somewhere, as noted by memoselyk, and can be fixed in a way described in a comment to this answer.

So you have to add encoding='latin1' to a call to _pickle.loads in gensim.utils.unpickle, load the model in Python 3, then save it, and now you can revert this fix and load this new model in unmodified gensim with Python 3.



易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!