word2vec

Bigram to a vector

血红的双手。 提交于 2019-12-21 12:28:10
问题 I want to construct word embeddings for documents using word2vec tool. I know how to find a vector embedding corresponding to a single word(unigram). Now, I want to find a vector for a bigram. Is it possible to do using word2vec? If yes, how? 回答1: The following snippet will get you the vector representation of a bigram. Note that the bigram you want to convert to a vector needs to have an underscore instead of a space between the words, e.g. bigram2vec(unigrams, "this report") is wrong, it

Installing faiss on Google Colaboratory

随声附和 提交于 2019-12-21 05:09:13
问题 I try to follow instruction on the MUSE project. They require PyTorch and Faiss. PyTorch is easy to install. But I found problem with installing Faiss. The instruction on MUSE tell me to use conda install faiss-cpu -c pytorch But Google Colab doesn't support conda (When I tried !pip install conda , it didn't work) And Faiss didn't work when I !pip install faiss either. Is there a way to install Faiss or conda? 回答1: Here's how I eventually install faiss. !wget https://anaconda.org/pytorch

Loading pre-trained word2vec to initialise embedding_lookup in the Estimator model_fn

倖福魔咒の 提交于 2019-12-21 05:05:13
问题 I am solving a text classification problem. I defined my classifier using the Estimator class with my own model_fn . I would like to use Google's pre-trained word2vec embedding as initial values and then further optimise it for the task at hand. I saw this post: Using a pre-trained word embedding (word2vec or Glove) in TensorFlow which explains how to go about it in 'raw' TensorFlow code. However, I would really like to use the Estimator class. As an extension, I would like to then train this

TensorFlow 'module' object has no attribute 'global_variables_initializer'

你。 提交于 2019-12-20 11:04:39
问题 I'm new to Tensorflow I'm running a Deep learning Assignment from Udacity on iPython notebook. link And it has an error. AttributeError Traceback (most recent call last) `<ipython-input-18-3446420b5935>` in `<module>`() 2 3 with tf.Session(graph=graph) as session: ----> 4 tf.global_variables_initializer().run() AttributeError: 'module' object has no attribute 'global_variables_initializer' Please help! How can I fix this? Thank you. 回答1: In older versions, it was called tf.initialize_all

What does a weighted word embedding mean?

白昼怎懂夜的黑 提交于 2019-12-20 10:08:05
问题 In the paper that I am trying to implement, it says, In this work, tweets were modeled using three types of text representation. The first one is a bag-of-words model weighted by tf-idf (term frequency - inverse document frequency) (Section 2.1.1). The second represents a sentence by averaging the word embeddings of all words (in the sentence) and the third represents a sentence by averaging the weighted word embeddings of all words, the weight of a word is given by tf-idf (Section 2.1.2). I

What does a weighted word embedding mean?

为君一笑 提交于 2019-12-20 10:08:03
问题 In the paper that I am trying to implement, it says, In this work, tweets were modeled using three types of text representation. The first one is a bag-of-words model weighted by tf-idf (term frequency - inverse document frequency) (Section 2.1.1). The second represents a sentence by averaging the word embeddings of all words (in the sentence) and the third represents a sentence by averaging the weighted word embeddings of all words, the weight of a word is given by tf-idf (Section 2.1.2). I

Get most similar words, given the vector of the word (not the word itself)

帅比萌擦擦* 提交于 2019-12-20 09:55:12
问题 Using the gensim.models.Word2Vec library, you have the possibility to provide a model and a "word" for which you want to find the list of most similar words: model = gensim.models.Word2Vec.load_word2vec_format(model_file, binary=True) model.most_similar(positive=[WORD], topn=N) I wonder if there is a possibility to give the system as input the model and a "vector", and ask the system to return the top similar words (which their vectors is very close to the given vector). Something similar to:

Error in extracting phrases using Gensim

我只是一个虾纸丫 提交于 2019-12-20 03:50:51
问题 I am trying to get the bigrams in the sentences using Phrases in Gensim as follows. from gensim.models import Phrases from gensim.models.phrases import Phraser documents = ["the mayor of new york was there", "machine learning can be useful sometimes","new york mayor was present"] sentence_stream = [doc.split(" ") for doc in documents] #print(sentence_stream) bigram = Phrases(sentence_stream, min_count=1, threshold=2, delimiter=b' ') bigram_phraser = Phraser(bigram) for sent in sentence_stream

How to fix 'C extension not loaded, training will be slow. Install a C compiler and reinstall gensim for fast training.'

家住魔仙堡 提交于 2019-12-20 01:06:46
问题 I'm using the library node2vec, which is based on gensim word2vec model to encode nodes in an embedding space, but when i want to fit the word2vec object I get this warning: C:\Users\lenovo\Anaconda3\lib\site-packages\gensim\models\base_any2vec.py:743: UserWarning: C extension not loaded, training will be slow. Install a C compiler and reinstall gensim for fast training. Can any one help me to fix this issue please ? 回答1: gensim relies on extension modules that need to be compiled. Both

Word2vec java实战

早过忘川 提交于 2019-12-20 00:44:27
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> 前言 在学习了 word2vec 的牛逼后,开始进入实战,解决问题了。 实战 添加依赖 <dependency> <groupId>com.medallia.word2vec</groupId> <artifactId>word2vecjava_2.11</artifactId> <version>1.0-ALLENAI-4</version> </dependency> 训练模型 由于语料比较小,各项参数,都调小了。 @Service @Slf4j public class Word2vecService { public Word2VecModel train() { try { List<String> data = List.of("anarchism originated as a term of abuse first used against early working class radicals including the diggers of the english anarchism originated as a term of abuse first"); List list = Lists.transform(data, var11 -> Arrays.asList(var11