word-embedding

Google BERT and antonym detection

旧巷老猫 提交于 2021-02-11 15:10:55
问题 I recently learned about the following phenomenon: Google BERT word embeddings of well-known state-of-the-art models seem to ignore the measure of semantical contrast between antonyms in terms of the natural distance(norm2 or cosine distance) between the corresponding embeddings. For example: The measure is the "cosine distance" (as oppose to the "cosine similarity"), that means closer vectors are supposed to have smaller distance between them. As one can see, BERT states "weak" and "powerful

BERT sentence embeddings: how to obtain sentence embeddings vector

天大地大妈咪最大 提交于 2021-02-11 13:41:14
问题 I'm using the module bert-for-tf2 in order to wrap BERT model as Keras layer in Tensorflow 2.0 I've followed your guide for implementing BERT model as Keras layer. I'm trying to extract embeddings from a sentence; in my case, the sentence is "Hello" I have a question about the output of the model prediction; I've written this model: model_word_embedding = tf.keras.Sequential([ tf.keras.layers.Input(shape=(4,), dtype='int32', name='input_ids'), bert_layer ]) model_word_embedding .build(input

Export gensim doc2vec embeddings into separate file to use with keras Embedding layer later

|▌冷眼眸甩不掉的悲伤 提交于 2021-02-10 07:08:07
问题 I am a bit new to gensim and right now I am trying to solve the problem which involves using the doc2vec embeddings in keras. I wasn't able to find existing implementation of doc2vec in keras - as far as I see in all examples I found so far everyone just uses the gensim to get the document embeddings. Once I trained my doc2vec model in gensim I need to export embeddings weights from genim into keras somehow and it is not really clear on how to do that. I see that model.syn0 Supposedly gives

Export gensim doc2vec embeddings into separate file to use with keras Embedding layer later

隐身守侯 提交于 2021-02-10 07:05:49
问题 I am a bit new to gensim and right now I am trying to solve the problem which involves using the doc2vec embeddings in keras. I wasn't able to find existing implementation of doc2vec in keras - as far as I see in all examples I found so far everyone just uses the gensim to get the document embeddings. Once I trained my doc2vec model in gensim I need to export embeddings weights from genim into keras somehow and it is not really clear on how to do that. I see that model.syn0 Supposedly gives

extracting numpy value from tensorflow object during transformation

旧时模样 提交于 2021-02-10 05:13:14
问题 i am trying to get word embeddings using tensorflow, and i have created adjacent work lists using my corpus. Number of unique words in my vocab are 8000 and number of adjacent word lists are around 1.6 million Word Lists sample photo Since the data is very large i am trying to write the word lists in batches to TFRecords file. def save_tfrecords_wordlist(toprocess_word_lists, path ): writer = tf.io.TFRecordWriter(path) for word_list in toprocess_word_lists: features=tf.train.Features( feature

extracting numpy value from tensorflow object during transformation

醉酒当歌 提交于 2021-02-10 05:07:38
问题 i am trying to get word embeddings using tensorflow, and i have created adjacent work lists using my corpus. Number of unique words in my vocab are 8000 and number of adjacent word lists are around 1.6 million Word Lists sample photo Since the data is very large i am trying to write the word lists in batches to TFRecords file. def save_tfrecords_wordlist(toprocess_word_lists, path ): writer = tf.io.TFRecordWriter(path) for word_list in toprocess_word_lists: features=tf.train.Features( feature

How to get both the word embeddings vector and context vector of a given word by using word2vec?

天涯浪子 提交于 2021-02-07 03:52:50
问题 from gensim.models import word2vec sentences = word2vec.Text8Corpus('TextFile') model = word2vec.Word2Vec(sentences, size=200, min_count = 2, workers = 4) print model['king'] Is the output vector the context vector of 'king' or the word embedding vector of 'king'? How can I get both context vector of 'king' and the word embedding vector of 'king'? Thanks! 回答1: It is the embedding vector for 'king'. If you use hierarchical softmax, the context vectors are: model.syn1 and if you use negative

How to get both the word embeddings vector and context vector of a given word by using word2vec?

两盒软妹~` 提交于 2021-02-07 03:51:31
问题 from gensim.models import word2vec sentences = word2vec.Text8Corpus('TextFile') model = word2vec.Word2Vec(sentences, size=200, min_count = 2, workers = 4) print model['king'] Is the output vector the context vector of 'king' or the word embedding vector of 'king'? How can I get both context vector of 'king' and the word embedding vector of 'king'? Thanks! 回答1: It is the embedding vector for 'king'. If you use hierarchical softmax, the context vectors are: model.syn1 and if you use negative

Download pre-trained BERT model locally

假如想象 提交于 2021-01-28 03:36:02
问题 I am using the SentenceTransformers library (here: https://pypi.org/project/sentence-transformers/#pretrained-models) for creating embeddings of sentences using the pretrained model bert-base-nli-mean-tokens . I have an application that will be deployed to a device that does not have internet access. How can I save this model locally so that when I call it, it loads the model locally, rather than attempting to download from the internet? As the library maintainers make clear, the method

Using Gensim Fasttext model with LSTM nn in keras

只谈情不闲聊 提交于 2020-12-31 14:52:51
问题 I have trained fasttext model with Gensim over the corpus of very short sentences (up to 10 words). I know that my test set includes words that are not in my train corpus, i.e some of the words in my corpus are like "Oxytocin" "Lexitocin", "Ematrophin",'Betaxitocin" given a new word in the test set, fasttext knows pretty well to generate a vector with high cosine-similarity to the other similar words in the train set by using the characters level n-gram How do i incorporate the fasttext model