similarity

A better similarity ranking algorithm for variable length strings

邮差的信 提交于 2019-11-26 03:24:15
问题 I\'m looking for a string similarity algorithm that yields better results on variable length strings than the ones that are usually suggested (levenshtein distance, soundex, etc). For example, Given string A: \"Robert\", Then string B: \"Amy Robertson\" would be a better match than String C: \"Richard\" Also, preferably, this algorithm should be language agnostic (also works in languages other than English). 回答1: Simon White of Catalysoft wrote an article about a very clever algorithm that

How to find similar results and sort by similarity?

我是研究僧i 提交于 2019-11-26 03:17:26
问题 How do I query for records ordered by similarity? Eg. searching for \"Stock Overflow\" would return Stack Overflow SharePoint Overflow Math Overflow Politic Overflow VFX Overflow Eg. searching for \"LO\" would return: pabLO picasso michelangeLO jackson polLOck What I need help with: Using a search engine to index & search a MySQL table, for better results Using the Sphinx search engine, with PHP Using the Lucene engine with PHP Using full-text indexing, to find similar/containing strings What

how to compute similarity between two strings in MYSQL

蓝咒 提交于 2019-11-26 03:09:55
问题 if i have two strings in mysql: @a=\"Welcome to Stack Overflow\" @b=\" Hello to stack overflow\"; is there a way to get the similarity percentage between those two string using MYSQL? here for example 3 words are similar and thus the similarity should be something like: count(similar words between @a and @b) / (count(@a)+count(@b) - count(intersection)) and thus the result is 3/(4 + 4 - 3)= 0.6 any idea is highly appreciated! 回答1: you can use this function (cop^H^H^Hadapted from http://www

机器学习各种相似性度量及Python实现

一个人想着一个人 提交于 2019-11-26 02:11:49
转自: https://blog.csdn.net/u010412858/article/details/60467382 在做很多研究问题时常常需要估算不同样本之间的相似性度量(Similarity Measurement),这时通常采用的方法就是计算样本间的“距离”(Distance)。采用什么样的方法计算距离是很讲究,甚至关系到分类的正确与否。 1、欧式距离 # 1) given two data points, calculate the euclidean distance between them def get_distance(data1, data2): points = zip(data1, data2) diffs_squared_distance = [pow(a - b, 2) for (a, b) in points] return math.sqrt(sum(diffs_squared_distance)) 2、余弦相似度 def cosin_distance(vector1, vector2): dot_product = 0.0 normA = 0.0 normB = 0.0 for a, b in zip(vector1, vector2): dot_product += a * b normA += a ** 2 normB += b ** 2

Find the similarity metric between two strings

青春壹個敷衍的年華 提交于 2019-11-26 01:37:12
问题 How do I get the probability of a string being similar to another string in Python? I want to get a decimal value like 0.9 (meaning 90%) etc. Preferably with standard Python and library. e.g. similar(\"Apple\",\"Appel\") #would have a high prob. similar(\"Apple\",\"Mango\") #would have a lower prob. 回答1: There is a built in. from difflib import SequenceMatcher def similar(a, b): return SequenceMatcher(None, a, b).ratio() Using it: >>> similar("Apple","Appel") 0.8 >>> similar("Apple","Mango")

Checking images for similarity with OpenCV

孤者浪人 提交于 2019-11-26 00:35:20
问题 Does OpenCV support the comparison of two images, returning some value (maybe a percentage) that indicates how similar these images are? E.g. 100% would be returned if the same image was passed twice, 0% would be returned if the images were totally different. I already read a lot of similar topics here on StackOverflow. I also did quite some Googling. Sadly I couldn\'t come up with a satisfying answer. 回答1: This is a huge topic, with answers from 3 lines of code to entire research magazines.

Calculate cosine similarity given 2 sentence strings

大城市里の小女人 提交于 2019-11-26 00:21:41
问题 From Python: tf-idf-cosine: to find document similarity , it is possible to calculate document similarity using tf-idf cosine. Without importing external libraries, are that any ways to calculate cosine similarity between 2 strings? s1 = \"This is a foo bar sentence .\" s2 = \"This sentence is similar to a foo bar sentence .\" s3 = \"What is this string ? Totally not related to the other two lines .\" cosine_sim(s1, s2) # Should give high cosine similarity cosine_sim(s1, s3) # Shouldn\'t give

A better similarity ranking algorithm for variable length strings

泪湿孤枕 提交于 2019-11-25 23:12:26
I'm looking for a string similarity algorithm that yields better results on variable length strings than the ones that are usually suggested (levenshtein distance, soundex, etc). For example, Given string A: "Robert", Then string B: "Amy Robertson" would be a better match than String C: "Richard" Also, preferably, this algorithm should be language agnostic (also works in languages other than English). Simon White of Catalysoft wrote an article about a very clever algorithm that compares adjacent character pairs that works really well for my purposes: http://www.catalysoft.com/articles