sentence-similarity | 易学教程

how to find similarity between two question even though the words are differentiate

阅读更多关于 how to find similarity between two question even though the words are differentiate

问题 is there is any way to find the meaning of the string is similar or not,,, even though the words in the string are differentiated Till now i tried fuzzy-wuzzy,levenstein distance,cosine similarity to match the string but all are matches the words not the meaning of the words Str1 = "what are types of negotiation" Str2 = "what are advantages of negotiation" Str3 = "what are categories of negotiation" Ratio = fuzz.ratio(Str1.lower(),Str2.lower()) Partial_Ratio = fuzz.partial_ratio(Str1.lower()

Integrating BERT sentence embedding into a siamese LSTM network

阅读更多关于 Integrating BERT sentence embedding into a siamese LSTM network

问题 I am working on a text similarity project and I wanted to experiment with a siamese LSTM network. I am working on modifying this implementation https://amitojdeep.github.io/amitoj-blogs/2017/12/31/semantic-similarity.html . The code is based on using Word2Vec word embeddings and I wanted to replace that with BERT sentence embeddings https://github.com/imgarylai/bert-embedding The resulting matrix has column 1 with the input sentence strings, column 2 with each cell containing the

Integrating BERT sentence embedding into a siamese LSTM network

阅读更多关于 Integrating BERT sentence embedding into a siamese LSTM network

I am working on a text similarity project and I wanted to experiment with a siamese LSTM network. I am working on modifying this implementation https://amitojdeep.github.io/amitoj-blogs/2017/12/31/semantic-similarity.html . The code is based on using Word2Vec word embeddings and I wanted to replace that with BERT sentence embeddings https://github.com/imgarylai/bert-embedding The resulting matrix has column 1 with the input sentence strings, column 2 with each cell containing the corresponding embedding matrix (num_words, 768). My understanding is that using this embedding matrix I can simply

Keras throws `'Tensor' object has no attribute '_keras_shape'` when splitting a layer output

阅读更多关于 Keras throws `'Tensor' object has no attribute '_keras_shape'` when splitting a layer output

问题 I have sentence embedding output X of a sentence pair of dimension 2*1*300 . I want to split this output into two vectors of shape 1*300 to calculate its absolute difference and product. x = MaxPooling2D(pool_size=(1,MAX_SEQUENCE_LENGTH),strides=(1,1))(x) x_A = Reshape((1,EMBEDDING_DIM))(x[:,0]) x_B = Reshape((1,EMBEDDING_DIM))(x[:,1]) diff = keras.layers.Subtract()([x_A, x_B]) prod = keras.layers.Multiply()([x_A, x_B]) nn = keras.layers.Concatenate()([diff, prod]) Currently, when I do x[:,0]

Keras throws `'Tensor' object has no attribute '_keras_shape'` when splitting a layer output

阅读更多关于 Keras throws `'Tensor' object has no attribute '_keras_shape'` when splitting a layer output

I have sentence embedding output X of a sentence pair of dimension 2*1*300 . I want to split this output into two vectors of shape 1*300 to calculate its absolute difference and product. x = MaxPooling2D(pool_size=(1,MAX_SEQUENCE_LENGTH),strides=(1,1))(x) x_A = Reshape((1,EMBEDDING_DIM))(x[:,0]) x_B = Reshape((1,EMBEDDING_DIM))(x[:,1]) diff = keras.layers.Subtract()([x_A, x_B]) prod = keras.layers.Multiply()([x_A, x_B]) nn = keras.layers.Concatenate()([diff, prod]) Currently, when I do x[:,0] it throws an error saying AttributeError: 'Tensor' object has no attribute '_keras_shape' . I assume

Sentence similarity using keras

阅读更多关于 Sentence similarity using keras

I'm trying to implement sentence similarity architecture based on this work using the STS dataset . Labels are normalized similarity scores from 0 to 1 so it is assumed to be a regression model. My problem is that the loss goes directly to NaN starting from the first epoch. What am I doing wrong? I have already tried updating to latest keras and theano versions. The code for my model is: def create_lstm_nn(input_dim): seq = Sequential()` # embedd using pretrained 300d embedding seq.add(Embedding(vocab_size, emb_dim, mask_zero=True, weights=[embedding_weights])) # encode via LSTM seq.add(LSTM

Sentence similarity using keras

阅读更多关于 Sentence similarity using keras

问题 I'm trying to implement sentence similarity architecture based on this work using the STS dataset. Labels are normalized similarity scores from 0 to 1 so it is assumed to be a regression model. My problem is that the loss goes directly to NaN starting from the first epoch. What am I doing wrong? I have already tried updating to latest keras and theano versions. The code for my model is: def create_lstm_nn(input_dim): seq = Sequential()` # embedd using pretrained 300d embedding seq.add