similarity

Finding text similarities between row values in excel

时光总嘲笑我的痴心妄想 提交于 2020-07-15 06:09:11
问题 Lets say I have 9 rows of records. Each 3 rows have the same value. For instance: Mike Mike Mike John John John Ryan Ryan Ryan Is there a way I can search for similarities of these records? For example spelling mistakes, additional characters, missing characters, etc. So, for example, the correct version is Mike , but there might be a record down in the list with value Mke which is incorrect (spelling mistake). How can I find this and replace it with the correct one? The above example is

how to merge strings that have certain number of substrings in common to produce some groups in a data frame in Python

不想你离开。 提交于 2020-07-07 11:12:27
问题 I asked a question like this. But that is a simple one. Which has been resolved. how to merge strings that have substrings in common to produce some groups in a data frame in Python. But here, I have an advanced version of the similar question: I have a sample data: a=pd.DataFrame({'ACTIVITY':['b,c','a','a,c,d,e','f,g,h,i','j,k,l','k,l,m']}) What I want to do is merge some strings if they have sub strings in common. So, in this example, the strings 'b,c','a','a,c,d,e' should be merged

how to merge strings that have certain number of substrings in common to produce some groups in a data frame in Python

江枫思渺然 提交于 2020-07-07 11:09:39
问题 I asked a question like this. But that is a simple one. Which has been resolved. how to merge strings that have substrings in common to produce some groups in a data frame in Python. But here, I have an advanced version of the similar question: I have a sample data: a=pd.DataFrame({'ACTIVITY':['b,c','a','a,c,d,e','f,g,h,i','j,k,l','k,l,m']}) What I want to do is merge some strings if they have sub strings in common. So, in this example, the strings 'b,c','a','a,c,d,e' should be merged

how to merge strings that have substrings in common to produce some groups in a data frame in Python

与世无争的帅哥 提交于 2020-07-07 06:59:49
问题 I have a sample data: a=pd.DataFrame({'ACTIVITY':['b,c','a','a,c,d,e','f,g,h,i','j,k,l','k,l,m']}) What I want to do is merge some strings if they have sub strings in common. So, in this example, the strings 'b,c','a','a,c,d,e' should be merged together because they can be linked to each other. 'j,k,l' and 'k,l,m' should be in one group. In the end, I hope I can have something like: group 'b,c', 0 'a', 0 'a,c,d,e', 0 'f,g,h,i', 1 'j,k,l', 2 'k,l,m' 2 So, I can have three groups and there is

Efficient computation of similarity matrix in Python (NumPy)

寵の児 提交于 2020-06-11 04:00:46
问题 Let X be a Bxn numpy matrix, i.e., import numpy as np B = 10 n = 2 X = np.random.random((B, n)) Now, I'm interested in computing the so-called kernel (or even similarity) matrix K , which is of shape BxB , and its {i,j} -th element is given as follows: K(i,j) = fun(x_i, x_j) where x_t denotes the t -th row of matrix X and fun is some function of x_i , x_j . For instance, this function could be the so-called RBF function, i.e., K(i,j) = exp(-|x_i - x_j|^2). For doing so, a naive way would be

机器学习基础 | 相似度或距离的度量

我的梦境 提交于 2020-03-21 18:03:21
目录 Minkowski Distance Pearson Correlation Coefficient Cosine Similarity Mahalanobis Distance 参考资料 在机器学习的聚类或者分类任务中,需要度量样本间的距离或者相似度。 本文总结常见距离(相似度)的计算方法。 本文主要关注“数值数据”的相似度(距离)的度量,对于布尔数据、文本数据、图像数据的相似性度量,可以参考如下资料: https://reference.wolfram.com/language/guide/DistanceAndSimilarityMeasures.html A Survey of Binary Similarity and Distance Measures , Seung-Seok Choi & Sung-Hyuk Cha & Charles C. Tappert A Survey of Text Similarity Approaches , Wael H. Gomaa & Aly A. Fahmy Encyclopedia of Distances , Michel Marie Deza & Elena Deza,这本书专门讲距离的度量,首推此书 Minkowski Distance 给定样本集合 \(X\) , \(X\) 是m维实数向量空间 \(R^{m}\)

Is it possible to compare an image with a list of images? [google-cloud-vision]

那年仲夏 提交于 2020-03-05 05:09:35
问题 I'm trying to compare one dog image with a bucket full of dog images and get their similarity, does anybody have some clue to do that? 回答1: You could try out Vision API's ProductSearch: https://cloud.google.com/vision/product-search/docs/ You build a ProductSet of Products. You add reference images to each Product. Later, you send a query image in and it will return the most visually similar results in your ProductSet. 回答2: You can use my Ruby gem that implementes two perceptual image hashing

图片相似度度量方法

不羁岁月 提交于 2020-03-04 19:31:18
度量方法参考资料地址 参考资料里面的是python代码,总共有三种方式,下面是参考其中一种写的C++ # include <iostream> # include <vector> # include <opencv2/opencv.hpp> using namespace std ; using namespace cv ; vector < int > dhash ( Mat imgSrc , int w , int h ) { Mat imgResize ; resize ( imgSrc , imgResize , Size ( w , h ) ) ; Mat img ; cvtColor ( imgResize , img , CV_BGR2GRAY ) ; vector < int > hash ; for ( int i = 0 ; i < w ; i ++ ) { for ( int j = 0 ; j < w ; j ++ ) { if ( img . at < uchar > ( i , j ) > img . at < uchar > ( i , j + 1 ) ) { hash . push_back ( 1 ) ; } else { hash . push_back ( 0 ) ; } } } return hash ; } float

How can I calculate the Jaccard Similarity of two lists containing strings in Python?

☆樱花仙子☆ 提交于 2020-02-19 09:53:18
问题 I have two lists with usernames and I want to calculate the Jaccard similarity. Is it possible? This thread shows how to calculate the Jaccard Similarity between two strings, however I want to apply this to two lists, where each element is one word (e.g., a username). 回答1: I ended up writing my own solution after all: def jaccard_similarity(list1, list2): intersection = len(list(set(list1).intersection(list2))) union = (len(list1) + len(list2)) - intersection return float(intersection) /

Scream detection

我的梦境 提交于 2020-02-08 09:49:47
问题 I'm working on a project that need to detect some voice patterns. for example "someone is screaming": since I do not know who is that person is,a child, men, women ... each have his own voice... etc. So, I'm looking for a way to detect a "screaming" by for example, save as many fingerprints of "screaming" as possible, then when I need to check if a voice is a "screaming" voice, I may create a fingerprint for it, then search and see if I can find a similarity on the list of "screaming"