word-embedding

How can I count word frequencies in Word2Vec's training model?

怎甘沉沦 提交于 2020-06-01 07:04:05
问题 I need to count the frequency of each word in word2vec 's training model. I want to have output that looks like this: term count apple 123004 country 4432180 runs 620102 ... Is it possible to do that? How would I get that data out of word2vec? 回答1: Which word2vec implementation are you using? In the popular gensim library, after a Word2Vec model has its vocabulary established (either by doing its full training, or after build_vocab() has been called), the model's wv property contains a

word2vec - what is best? add, concatenate or average word vectors?

空扰寡人 提交于 2020-05-25 06:41:22
问题 I am working on a recurrent language model. To learn word embeddings that can be used to initialize my language model, I am using gensim's word2vec model. After training, the word2vec model holds two vectors for each word in the vocabulary: the word embedding (rows of input/hidden matrix) and the context embedding (columns of hidden/output matrix). As outlined in this post there are at least three common ways to combine these two embedding vectors: summing the context and word vector for each

Glove6b50d parsing: could not convert string to float: '-'

左心房为你撑大大i 提交于 2020-05-17 06:04:23
问题 I am trying to parse the Glove6b50d data from Kaggle in via Google Colab, then run it through the word2vec process (apologies for the huge URL - it's the fastest link I've found). However, I'm hitting a bug where '-' tokens are not parsed correctly, resulting in the above error. I have attempted to handle this in a few ways. I've also looked into the load_word2vec_format method itself and tried to ignore errors, however it doesn't seem to make a difference. I've tried a map method on line two

Glove6b50d parsing: could not convert string to float: '-'

守給你的承諾、 提交于 2020-05-17 06:04:11
问题 I am trying to parse the Glove6b50d data from Kaggle in via Google Colab, then run it through the word2vec process (apologies for the huge URL - it's the fastest link I've found). However, I'm hitting a bug where '-' tokens are not parsed correctly, resulting in the above error. I have attempted to handle this in a few ways. I've also looked into the load_word2vec_format method itself and tried to ignore errors, however it doesn't seem to make a difference. I've tried a map method on line two

Keras embedding layer with variable length in functional API

丶灬走出姿态 提交于 2020-05-13 17:57:37
问题 I have the following sequential model that works with variable length inputs: m = Sequential() m.add(Embedding(len(chars), 4, name="embedding")) m.add(Bidirectional(LSTM(16, unit_forget_bias=True, name="lstm"))) m.add(Dense(len(chars),name="dense")) m.add(Activation("softmax")) m.summary() Gives the following summary: _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= embedding

Keras embedding layer with variable length in functional API

北城以北 提交于 2020-05-13 17:56:21
问题 I have the following sequential model that works with variable length inputs: m = Sequential() m.add(Embedding(len(chars), 4, name="embedding")) m.add(Bidirectional(LSTM(16, unit_forget_bias=True, name="lstm"))) m.add(Dense(len(chars),name="dense")) m.add(Activation("softmax")) m.summary() Gives the following summary: _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= embedding

How does Fine-tuning Word Embeddings work?

前提是你 提交于 2020-05-11 06:28:09
问题 I've been reading some NLP with Deep Learning papers and found Fine-tuning seems to be a simple but yet confusing concept. There's been the same question asked here but still not quite clear. Fine-tuning pre-trained word embeddings to task-specific word embeddings as mentioned in papers like Y. Kim, “Convolutional Neural Networks for Sentence Classification,” and K. S. Tai, R. Socher, and C. D. Manning, “Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks,”

No module named 'gensim' but already installed it

你。 提交于 2020-05-09 16:57:06
问题 i'm having this error problem, i have ran this script in jupyter notebook in base (root) environment, the log said that gensim library has been installed and i have run the command !pip install gensim before i import it, but it still can not be imported, and the error said ModuleNotFoundError: No module named 'gensim' !pip install gensim import gensim from gensim.models import KeyedVectors model = KeyedVectors.load('model_fasttext2.vec') model.vector_size -------------------------------------

word2vec - KeyError: “word X not in vocabulary”

大城市里の小女人 提交于 2020-01-25 08:22:27
问题 Using the Word2Vec implementation of the module gensim in order to construct word embeddings for the sentences I do have in a plain text file. Despite the word happy is defined in the vocabulary, getting the error KeyError: "word 'happy' not in vocabulary" . Tried to apply the given the answers to a similar question, but did not work. Hence, posted my own question. Here is the code: try: data = [] with open(TXT_PATH, 'r', encoding='utf-8') as txt_file: for line in txt_file: for part in line

add LSTM/GRU to BERT embeddings in keras tensorflow

℡╲_俬逩灬. 提交于 2020-01-24 11:34:10
问题 I am experimenting with BERT embeddings following this code https://github.com/strongio/keras-bert/blob/master/keras-bert.py These are the important bits of the code (lines 265-267): bert_output = BertLayer(n_fine_tune_layers=3)(bert_inputs) dense = tf.keras.layers.Dense(256, activation="relu")(bert_output) pred = tf.keras.layers.Dense(1, activation="sigmoid")(dense) I want to add a GRU between BertLayer and the Dense layer bert_output = BertLayer(n_fine_tune_layers=3)(bert_inputs) gru_out =