sentence-similarity

Similarity between 2 dataframe columns

对着背影说爱祢 提交于 2021-02-08 07:56:34
问题 I have two dataframes and each have a column called Song. However sometimes the songs are spelled differently. How can I used difflib (or something similar) to get the Song spelling of one dataframe in a new column of the other dataframe? ex: Dataframe1 Song Artist like a virgi madonna Dataframe2 Song Rank like a virgin 2 Result Song Artist SongAlt like a virgin Madonna like a virgi 回答1: Step 1: Merge whatever can be merged In [67]: df1 Out[67]: Song Artist 0 mysong myartist 1 like a virgi

Similarity between 2 dataframe columns

爷,独闯天下 提交于 2021-02-08 07:56:07
问题 I have two dataframes and each have a column called Song. However sometimes the songs are spelled differently. How can I used difflib (or something similar) to get the Song spelling of one dataframe in a new column of the other dataframe? ex: Dataframe1 Song Artist like a virgi madonna Dataframe2 Song Rank like a virgin 2 Result Song Artist SongAlt like a virgin Madonna like a virgi 回答1: Step 1: Merge whatever can be merged In [67]: df1 Out[67]: Song Artist 0 mysong myartist 1 like a virgi

tensorflow 1 Session.run is taking too much time to embed sentence using universal sentence encoder

Deadly 提交于 2020-07-23 06:35:37
问题 Using tensforflow with flask REST API How should i reduce the time for session.run I am using tf 1/2 in REST API, instead of serving it i am using it on my server. i have tried tensorflow 1 and 2. tensorflow 1 is taking too much time. tensorflow 2 is not even returning the vectors for text. in tensorflow 1 initialising is taking 2-4 seconds and session.run is taking 5-8 seconds. and time is getting increased as i keep hitting the requests. tensorflow 1 import tensorflow.compat.v1 as tfo

How to train a model that will result in the similarity score between two news titles?

自作多情 提交于 2020-07-22 21:40:04
问题 I am trying to build a Fake news classifier and I am quite new in this field. I have a column "title_1_en" which has the title for fake news and another column called "title_2_en". There are 3 target labels; "agreed", "disagreed", and "unrelated" if the title of the news in column "title_2_en" agrees, disagrees or is unrelated to that in the first column. I have tried calculating basic cosine similarity between the two titles after converting the words of the sentences into vectors. This has

How to train a model that will result in the similarity score between two news titles?

♀尐吖头ヾ 提交于 2020-07-22 21:38:38
问题 I am trying to build a Fake news classifier and I am quite new in this field. I have a column "title_1_en" which has the title for fake news and another column called "title_2_en". There are 3 target labels; "agreed", "disagreed", and "unrelated" if the title of the news in column "title_2_en" agrees, disagrees or is unrelated to that in the first column. I have tried calculating basic cosine similarity between the two titles after converting the words of the sentences into vectors. This has

How to train a model that will result in the similarity score between two news titles?

人盡茶涼 提交于 2020-07-22 21:38:20
问题 I am trying to build a Fake news classifier and I am quite new in this field. I have a column "title_1_en" which has the title for fake news and another column called "title_2_en". There are 3 target labels; "agreed", "disagreed", and "unrelated" if the title of the news in column "title_2_en" agrees, disagrees or is unrelated to that in the first column. I have tried calculating basic cosine similarity between the two titles after converting the words of the sentences into vectors. This has

Bert fine-tuned for semantic similarity

夙愿已清 提交于 2020-06-08 12:31:33
问题 I would like to apply fine-tuning Bert to calculate semantic similarity between sentences. I search a lot websites, but I almost not found downstream about this. I just found STS benchmark . I wonder if I can use STS benchmark dataset to train a fine-tuning bert model, and apply it to my task. Is it reasonable? As I know, there are a lot method to calculate similarity including cosine similarity, pearson correlation, manhattan distance, etc. How choose for semantic similarity? 回答1: As a

Bert fine-tuned for semantic similarity

南笙酒味 提交于 2020-06-08 12:28:11
问题 I would like to apply fine-tuning Bert to calculate semantic similarity between sentences. I search a lot websites, but I almost not found downstream about this. I just found STS benchmark . I wonder if I can use STS benchmark dataset to train a fine-tuning bert model, and apply it to my task. Is it reasonable? As I know, there are a lot method to calculate similarity including cosine similarity, pearson correlation, manhattan distance, etc. How choose for semantic similarity? 回答1: As a

Similarity between two lists of documents

老子叫甜甜 提交于 2020-01-25 08:57:06
问题 I need to find the similarity between two lists of the short texts in Python. Texts can be 1-4 word long. The length of the lists can be 10K each. I didn't find how to do this effectively in spaCy. Maybe other packages can do this? I assume the words are represented by a vector (300d), but any other options are also Ok. This task can be done in a cycle, but there should be a more effective way for sure. This task fits the TensorFlow, pyTorch, and similar packages, but I'm not familiar with

How to perform efficient queries with Gensim doc2vec?

て烟熏妆下的殇ゞ 提交于 2019-12-12 16:33:15
问题 I’m working on a sentence similarity algorithm with the following use case: given a new sentence, I want to retrieve its n most similar sentences from a given set. I am using Gensim v.3.7.1, and I have trained both word2vec and doc2vec models. The results of the latter outperform word2vec’s, but I’m having trouble performing efficient queries with my Doc2Vec model. This model uses the distributed bag of words implementation (dm = 0). I used to infer similarity using the built in method model