gensim

My Doc2Vec code, after many loops of training, isn't giving good results. What might be wrong?

独自空忆成欢 提交于 2020-07-23 06:52:06
问题 I'm training a Doc2Vec model using the below code, where tagged_data is a list of TaggedDocument instances I set up before: max_epochs = 40 model = Doc2Vec(alpha=0.025, min_alpha=0.001) model.build_vocab(tagged_data) for epoch in range(max_epochs): print('iteration {0}'.format(epoch)) model.train(tagged_data, total_examples=model.corpus_count, epochs=model.iter) # decrease the learning rate model.alpha -= 0.001 # fix the learning rate, no decay model.min_alpha = model.alpha model.save("d2v

My Doc2Vec code, after many loops of training, isn't giving good results. What might be wrong?

限于喜欢 提交于 2020-07-23 06:51:17
问题 I'm training a Doc2Vec model using the below code, where tagged_data is a list of TaggedDocument instances I set up before: max_epochs = 40 model = Doc2Vec(alpha=0.025, min_alpha=0.001) model.build_vocab(tagged_data) for epoch in range(max_epochs): print('iteration {0}'.format(epoch)) model.train(tagged_data, total_examples=model.corpus_count, epochs=model.iter) # decrease the learning rate model.alpha -= 0.001 # fix the learning rate, no decay model.min_alpha = model.alpha model.save("d2v

How to train a model that will result in the similarity score between two news titles?

自作多情 提交于 2020-07-22 21:40:04
问题 I am trying to build a Fake news classifier and I am quite new in this field. I have a column "title_1_en" which has the title for fake news and another column called "title_2_en". There are 3 target labels; "agreed", "disagreed", and "unrelated" if the title of the news in column "title_2_en" agrees, disagrees or is unrelated to that in the first column. I have tried calculating basic cosine similarity between the two titles after converting the words of the sentences into vectors. This has

How to train a model that will result in the similarity score between two news titles?

♀尐吖头ヾ 提交于 2020-07-22 21:38:38
问题 I am trying to build a Fake news classifier and I am quite new in this field. I have a column "title_1_en" which has the title for fake news and another column called "title_2_en". There are 3 target labels; "agreed", "disagreed", and "unrelated" if the title of the news in column "title_2_en" agrees, disagrees or is unrelated to that in the first column. I have tried calculating basic cosine similarity between the two titles after converting the words of the sentences into vectors. This has

How to train a model that will result in the similarity score between two news titles?

人盡茶涼 提交于 2020-07-22 21:38:20
问题 I am trying to build a Fake news classifier and I am quite new in this field. I have a column "title_1_en" which has the title for fake news and another column called "title_2_en". There are 3 target labels; "agreed", "disagreed", and "unrelated" if the title of the news in column "title_2_en" agrees, disagrees or is unrelated to that in the first column. I have tried calculating basic cosine similarity between the two titles after converting the words of the sentences into vectors. This has

train Gensim word2vec using large txt file

主宰稳场 提交于 2020-07-10 10:20:26
问题 I have a large txt file(150MG) like this 'intrepid', 'bumbling', 'duo', 'deliver', 'good', 'one', 'better', 'offering', 'considerable', 'cv', 'freshly', 'qualified', 'private', ... I wanna train word2vec model model using that file but it gives me RAM problem.i dont know how to feed txt file to word2vec model.this is my code.i know that my code has problem but i don't know where is it. import gensim f = open('your_file1.txt') for line in f: b=line model = gensim.models.Word2Vec([b],min_count

Combining/adding vectors from different word2vec models

吃可爱长大的小学妹 提交于 2020-06-17 03:53:05
问题 I am using gensim to create Word2Vec models trained on large text corpora. I have some models based on StackExchange data dumps. I also have a model trained on a corpus derived from English Wikipedia. Assume that a vocabulary term is in both models, and that the models were created with the same parameters to Word2Vec. Is there any way to combine or add the vectors from the two separate models to create a single new model that has the same word vectors that would have resulted if I had

probabilities returned by gensim's get_document_topics method doesn't add up to one

落爺英雄遲暮 提交于 2020-06-12 05:14:26
问题 Sometimes it returns probabilities for all topics and all is fine, but sometimes it returns probabilities for just a few topics and they don't add up to one, it seems it depends on the document. Generally when it returns few topics, the probabilities add up to more or less 80%, so is it returning just the most relevant topics? Is there a way to force it to return all probabilities? Maybe I'm missing something but I can't find any documentation of the method's parameters. 回答1: I had the same

In spacy, how to use your own word2vec model created in gensim?

蓝咒 提交于 2020-05-25 12:19:42
问题 I have trained my own word2vec model in gensim and I am trying to load that model in spacy. First, I need to save it in my disk and then try to load an init-model in spacy but unable to figure out exactly how. gensimmodel Out[252]: <gensim.models.word2vec.Word2Vec at 0x110b24b70> import spacy spacy.load(gensimmodel) OSError: [E050] Can't find model 'Word2Vec(vocab=250, size=1000, alpha=0.025)'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory. 回答1:

In spacy, how to use your own word2vec model created in gensim?

一曲冷凌霜 提交于 2020-05-25 12:18:30
问题 I have trained my own word2vec model in gensim and I am trying to load that model in spacy. First, I need to save it in my disk and then try to load an init-model in spacy but unable to figure out exactly how. gensimmodel Out[252]: <gensim.models.word2vec.Word2Vec at 0x110b24b70> import spacy spacy.load(gensimmodel) OSError: [E050] Can't find model 'Word2Vec(vocab=250, size=1000, alpha=0.025)'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory. 回答1: