sentence-similarity | 易学教程

Similarity between 2 dataframe columns

阅读更多关于 Similarity between 2 dataframe columns

问题 I have two dataframes and each have a column called Song. However sometimes the songs are spelled differently. How can I used difflib (or something similar) to get the Song spelling of one dataframe in a new column of the other dataframe? ex: Dataframe1 Song Artist like a virgi madonna Dataframe2 Song Rank like a virgin 2 Result Song Artist SongAlt like a virgin Madonna like a virgi 回答1: Step 1: Merge whatever can be merged In [67]: df1 Out[67]: Song Artist 0 mysong myartist 1 like a virgi

Similarity between 2 dataframe columns

阅读更多关于 Similarity between 2 dataframe columns

tensorflow 1 Session.run is taking too much time to embed sentence using universal sentence encoder

阅读更多关于 tensorflow 1 Session.run is taking too much time to embed sentence using universal sentence encoder

问题 Using tensforflow with flask REST API How should i reduce the time for session.run I am using tf 1/2 in REST API, instead of serving it i am using it on my server. i have tried tensorflow 1 and 2. tensorflow 1 is taking too much time. tensorflow 2 is not even returning the vectors for text. in tensorflow 1 initialising is taking 2-4 seconds and session.run is taking 5-8 seconds. and time is getting increased as i keep hitting the requests. tensorflow 1 import tensorflow.compat.v1 as tfo

How to train a model that will result in the similarity score between two news titles?

阅读更多关于 How to train a model that will result in the similarity score between two news titles?

问题 I am trying to build a Fake news classifier and I am quite new in this field. I have a column "title_1_en" which has the title for fake news and another column called "title_2_en". There are 3 target labels; "agreed", "disagreed", and "unrelated" if the title of the news in column "title_2_en" agrees, disagrees or is unrelated to that in the first column. I have tried calculating basic cosine similarity between the two titles after converting the words of the sentences into vectors. This has

How to train a model that will result in the similarity score between two news titles?

阅读更多关于 How to train a model that will result in the similarity score between two news titles?

How to train a model that will result in the similarity score between two news titles?

阅读更多关于 How to train a model that will result in the similarity score between two news titles?

Bert fine-tuned for semantic similarity

阅读更多关于 Bert fine-tuned for semantic similarity

问题 I would like to apply fine-tuning Bert to calculate semantic similarity between sentences. I search a lot websites, but I almost not found downstream about this. I just found STS benchmark . I wonder if I can use STS benchmark dataset to train a fine-tuning bert model, and apply it to my task. Is it reasonable? As I know, there are a lot method to calculate similarity including cosine similarity, pearson correlation, manhattan distance, etc. How choose for semantic similarity? 回答1: As a

Bert fine-tuned for semantic similarity

阅读更多关于 Bert fine-tuned for semantic similarity

Similarity between two lists of documents

阅读更多关于 Similarity between two lists of documents

问题 I need to find the similarity between two lists of the short texts in Python. Texts can be 1-4 word long. The length of the lists can be 10K each. I didn't find how to do this effectively in spaCy. Maybe other packages can do this? I assume the words are represented by a vector (300d), but any other options are also Ok. This task can be done in a cycle, but there should be a more effective way for sure. This task fits the TensorFlow, pyTorch, and similar packages, but I'm not familiar with

How to perform efficient queries with Gensim doc2vec?

阅读更多关于 How to perform efficient queries with Gensim doc2vec?

问题 I’m working on a sentence similarity algorithm with the following use case: given a new sentence, I want to retrieve its n most similar sentences from a given set. I am using Gensim v.3.7.1, and I have trained both word2vec and doc2vec models. The results of the latter outperform word2vec’s, but I’m having trouble performing efficient queries with my Doc2Vec model. This model uses the distributed bag of words implementation (dm = 0). I used to infer similarity using the built in method model