问题
I'm trying to load the pre-trained words2vecs which I've found here (https://github.com/mmihaltz/word2vec-GoogleNews-vectors) I used the following command:
model = gensim.models.KeyedVectors.load_word2vec_format('word2vec.bin.gz', binary=False)
And it throws this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/deeplearning/anaconda3/lib/python3.6/site-
packages/gensim/models/keyedvectors.py", line 193, in
load_word2vec_format
header = utils.to_unicode(fin.readline(), encoding=encoding)
File "/home/deeplearning/anaconda3/lib/python3.6/gzip.py", line 374,
in readline
return self._buffer.readline(size)
File "/home/deeplearning/anaconda3/lib/python3.6/_compression.py",
line 68, in readinto
data = self.read(len(byte_view))
File "/home/deeplearning/anaconda3/lib/python3.6/gzip.py", line 463,
in read
if not self._read_gzip_header():
File "/home/deeplearning/anaconda3/lib/python3.6/gzip.py", line 411,
in _read_gzip_header
raise OSError('Not a gzipped file (%r)' % magic)
OSError: Not a gzipped file (b've')
回答1:
Because the error says, "Not a gzipped file", maybe the file has been inadvertently uncompressed, but still has the misleading .gzip
extension? (Try renaming without .gzip
, an loading that file.)
Because the filename includes .bin
, it is likely a 'binary' style word2vec format. So the optional parameter may need to be binary=True
.
The filename you're using, word2vec.bin.gz
, does not match the filename at the link you provided, GoogleNews-vectors-negative300.bin.gz
. This suggests other changes may have occurred that could cause problems.
The error also reports a 'magic number' (indicative prefix) from the file, b've'
, that looks like a bit of plain text, rather than the beginning of a real gzip file. You might want to look at the first few lines of the problem file, via something like head word2vec.bin.gz
, to see if there are other indications of what it is (other than what you expect it to be).
来源:https://stackoverflow.com/questions/44045881/failed-to-load-a-bin-gz-pre-trained-words2vecx