sentence-similarity

how to find similarity between two question even though the words are differentiate

青春壹個敷衍的年華 提交于 2019-12-10 12:35:00
问题 is there is any way to find the meaning of the string is similar or not,,, even though the words in the string are differentiated Till now i tried fuzzy-wuzzy,levenstein distance,cosine similarity to match the string but all are matches the words not the meaning of the words Str1 = "what are types of negotiation" Str2 = "what are advantages of negotiation" Str3 = "what are categories of negotiation" Ratio = fuzz.ratio(Str1.lower(),Str2.lower()) Partial_Ratio = fuzz.partial_ratio(Str1.lower()

Integrating BERT sentence embedding into a siamese LSTM network

老子叫甜甜 提交于 2019-12-06 16:35:11
问题 I am working on a text similarity project and I wanted to experiment with a siamese LSTM network. I am working on modifying this implementation https://amitojdeep.github.io/amitoj-blogs/2017/12/31/semantic-similarity.html . The code is based on using Word2Vec word embeddings and I wanted to replace that with BERT sentence embeddings https://github.com/imgarylai/bert-embedding The resulting matrix has column 1 with the input sentence strings, column 2 with each cell containing the

Integrating BERT sentence embedding into a siamese LSTM network

天涯浪子 提交于 2019-12-04 23:17:27
I am working on a text similarity project and I wanted to experiment with a siamese LSTM network. I am working on modifying this implementation https://amitojdeep.github.io/amitoj-blogs/2017/12/31/semantic-similarity.html . The code is based on using Word2Vec word embeddings and I wanted to replace that with BERT sentence embeddings https://github.com/imgarylai/bert-embedding The resulting matrix has column 1 with the input sentence strings, column 2 with each cell containing the corresponding embedding matrix (num_words, 768). My understanding is that using this embedding matrix I can simply

Keras throws `'Tensor' object has no attribute '_keras_shape'` when splitting a layer output

巧了我就是萌 提交于 2019-12-02 06:57:17
问题 I have sentence embedding output X of a sentence pair of dimension 2*1*300 . I want to split this output into two vectors of shape 1*300 to calculate its absolute difference and product. x = MaxPooling2D(pool_size=(1,MAX_SEQUENCE_LENGTH),strides=(1,1))(x) x_A = Reshape((1,EMBEDDING_DIM))(x[:,0]) x_B = Reshape((1,EMBEDDING_DIM))(x[:,1]) diff = keras.layers.Subtract()([x_A, x_B]) prod = keras.layers.Multiply()([x_A, x_B]) nn = keras.layers.Concatenate()([diff, prod]) Currently, when I do x[:,0]

Keras throws `'Tensor' object has no attribute '_keras_shape'` when splitting a layer output

时光毁灭记忆、已成空白 提交于 2019-12-01 23:30:00
I have sentence embedding output X of a sentence pair of dimension 2*1*300 . I want to split this output into two vectors of shape 1*300 to calculate its absolute difference and product. x = MaxPooling2D(pool_size=(1,MAX_SEQUENCE_LENGTH),strides=(1,1))(x) x_A = Reshape((1,EMBEDDING_DIM))(x[:,0]) x_B = Reshape((1,EMBEDDING_DIM))(x[:,1]) diff = keras.layers.Subtract()([x_A, x_B]) prod = keras.layers.Multiply()([x_A, x_B]) nn = keras.layers.Concatenate()([diff, prod]) Currently, when I do x[:,0] it throws an error saying AttributeError: 'Tensor' object has no attribute '_keras_shape' . I assume

Sentence similarity using keras

两盒软妹~` 提交于 2019-11-30 06:55:12
I'm trying to implement sentence similarity architecture based on this work using the STS dataset . Labels are normalized similarity scores from 0 to 1 so it is assumed to be a regression model. My problem is that the loss goes directly to NaN starting from the first epoch. What am I doing wrong? I have already tried updating to latest keras and theano versions. The code for my model is: def create_lstm_nn(input_dim): seq = Sequential()` # embedd using pretrained 300d embedding seq.add(Embedding(vocab_size, emb_dim, mask_zero=True, weights=[embedding_weights])) # encode via LSTM seq.add(LSTM

Sentence similarity using keras

ぃ、小莉子 提交于 2019-11-29 07:42:27
问题 I'm trying to implement sentence similarity architecture based on this work using the STS dataset. Labels are normalized similarity scores from 0 to 1 so it is assumed to be a regression model. My problem is that the loss goes directly to NaN starting from the first epoch. What am I doing wrong? I have already tried updating to latest keras and theano versions. The code for my model is: def create_lstm_nn(input_dim): seq = Sequential()` # embedd using pretrained 300d embedding seq.add