Ensure the gensim generate the same Word2Vec model for different runs on the same data

后端未结

关注

 3  777

Happy的楠姐 2020-12-31 08:13

In LDA model generates different topics everytime i train on the same corpus , by setting the np.random.seed(0), the LDA model will always be initialized and tr

3条回答

死守一世寂寞 (楼主)

2020-12-31 08:38
Yes, default random seed is fixed to 1, as described by the author in https://radimrehurek.com/gensim/models/word2vec.html. Vectors for each word are initialised using a hash of the concatenation of word + str(seed).

Hashing function used, however, is Python’s rudimentary built in hash function and can produce different results if two machines differ in
- 32 vs 64 bit, reference
- python versions, reference
- different Operating Systems/ Interpreters, reference1, reference2
Above list is not exhaustive. Does it cover your question though?

EDIT

If you want to ensure consistency, you can provide your own hashing function as an argument in word2vec

A very simple (and bad) example would be:
```
def hash(astring):
   return ord(astring[0])

model = Word2Vec(sentences, size=10, window=5, min_count=5, workers=4, hashfxn=hash)

print model[sentences[0][0]]
```
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...