word2vec | 易学教程

Error: 'utf8' codec can't decode byte 0x80 in position 0: invalid start byte

阅读更多关于 Error: 'utf8' codec can't decode byte 0x80 in position 0: invalid start byte

问题 I am trying to do the following kaggle assignmnet. I am using gensim package to use word2vec. I am able to create the model and store it to disk. But when I am trying to load the file back I am getting the error below. -HP-dx2280-MT-GR541AV:~$ python prog_w2v.py Traceback (most recent call last): File "prog_w2v.py", line 7, in <module> models = gensim.models.Word2Vec.load_word2vec_format('300features_40minwords_10context.txt', binary=True) File "/usr/local/lib/python2.7/dist-packages/gensim

How to find synonyms based on word2vec

阅读更多关于 How to find synonyms based on word2vec

问题 I 'm working on word2vec model using gensim in Python, but I found that the result are the words having the same theme, synonyms are only part of the result. Can I find synonyms of a word based on the work I have done? Any replies will be appreciated! 回答1: Word2vec tends to indicate similar words – but as you've probably seen, the kind of similarity it learns includes more than just pure synonyms. For example, word2vec similarities include words that appear in similar contexts, such as

Why Gensim doc2vec give AttributeError: 'list' object has no attribute 'words'?

阅读更多关于 Why Gensim doc2vec give AttributeError: 'list' object has no attribute 'words'?

问题 I am trying to experiment gensim doc2vec, by using following code. As far as I understand from tutorials, it should work. However it gives AttributeError: 'list' object has no attribute 'words'. from gensim.models.doc2vec import LabeledSentence, Doc2Vec document = LabeledSentence(words=['some', 'words', 'here'], tags=['SENT_1']) model = Doc2Vec(document, size = 100, window = 300, min_count = 10, workers=4) So what did I do wrong? Any help please. Thank you. I am using python 3.5 and gensim 0

Gensim: how to retrain doc2vec model using previous word2vec model

阅读更多关于 Gensim: how to retrain doc2vec model using previous word2vec model

With Doc2Vec modelling, I have trained a model and saved following files: 1. model 2. model.docvecs.doctag_syn0.npy 3. model.syn0.npy 4. model.syn1.npy 5. model.syn1neg.npy However, I have a new way to label the documents and want to train the model again. since the word vectors already obtained from previous version. Is there any way to reuse that model (e.g., taking the previous w2v results as initial vectors for training)? Any one know how to do it? I've figured out that, we can just load the model and continue to train. model = Doc2Vec.load("old_model") model.train(sentences) 来源： https:/

Gensim Word2Vec Model trained but not saved

阅读更多关于 Gensim Word2Vec Model trained but not saved

问题 I am using gensim and executed the following code (simplified): model = gensim.models.Word2Vec(...) mode.build_vocab(sentences) model.train(...) model.save('file_name') After days my code finished model.train(...) . However, during saving, I experienced: Process finished with exit code 137 (interrupted by signal 9: SIGKILL) I noticed that there were some npy files generated: <...>.trainables.syn1neg.npy <...>.trainables.vectors_lockf.npy <...>.wv.vectors.npy Are those intermediate results I

Pointing out a single dot with text input

阅读更多关于 Pointing out a single dot with text input

问题 i want to plot my gensim-word2vec model in kind of a "word-galaxy"(like here: http://www.anthonygarvan.com/wordgalaxy/) and flashing out a single dot by entering it's name in a search field and pressing a submit-button. I'm fairly new to all this python-stuff and so i actually don't understand the curdoc documentation or the example here: https://github.com/bokeh/bokeh/tree/master/examples/app/movies. This is my code: from bokeh.plotting import figure, output_file, show, ColumnDataSource from

How to initialize a new word2vec model with pre-trained model weights?

阅读更多关于 How to initialize a new word2vec model with pre-trained model weights?

问题 I am using Gensim Library in python for using and training word2vector model. Recently, I was looking at initializing my model weights with some pre-trained word2vec model such as (GoogleNewDataset pretrained model). I have been struggling with it couple of weeks. Now, I just searched out that in gesim there is a function that can help me to initialize the weights of my model with pre-trained model weights. That is mentioned below: reset_from(other_model) Borrow shareable pre-built structures

I used word2vec in deeplearning4j to train word vectors, but those vectors are unstable

阅读更多关于 I used word2vec in deeplearning4j to train word vectors, but those vectors are unstable

问题 1.I use IntelliJ IDEA build a maven project,code is as follows： System.out.println("Load data...."); SentenceIterator iter = new LineSentenceIterator(new File("/home/zs/programs/deeplearning4j-master/dl4j-test-resources/src/main/resources/raw_sentences.txt")); iter.setPreProcessor(new SentencePreProcessor() { @Override return sentence.toLowerCase(); } }); System.out.println("Build model...."); int batchSize = 1000; int iterations = 30; int layerSize = 300; com.sari.Word2Vec vec= new com.sari

Gensim: how to retrain doc2vec model using previous word2vec model

阅读更多关于 Gensim: how to retrain doc2vec model using previous word2vec model

问题 With Doc2Vec modelling, I have trained a model and saved following files: 1. model 2. model.docvecs.doctag_syn0.npy 3. model.syn0.npy 4. model.syn1.npy 5. model.syn1neg.npy However, I have a new way to label the documents and want to train the model again. since the word vectors already obtained from previous version. Is there any way to reuse that model (e.g., taking the previous w2v results as initial vectors for training)? Any one know how to do it? 回答1: I've figured out that, we can just

AttributeError: module 'boto' has no attribute 'plugin'

阅读更多关于 AttributeError: module 'boto' has no attribute 'plugin'

问题 I'm running a VM on Google Cloud Platform using Jupyter notebook with word2vec models. I have the following code snippet: from gensim.models import Word2Vec amazon_word2vec = Word2Vec(model, min_count=1, size=100) And it results in the error: AttributeError: module 'boto' has no attribute 'plugin' What is the solution to the above problem? 回答1: pip install google-compute-engine install google compute engine and restart your vm and it works fine. 来源： https://stackoverflow.com/questions