similarity

How do I create a simliarity matrix in MATLAB?

时光怂恿深爱的人放手 提交于 2019-11-28 14:15:20
I am working towards comparing multiple images. I have these image data as column vectors of a matrix called "images." I want to assess the similarity of images by first computing their Eucledian distance. I then want to create a matrix over which I can execute multiple random walks. Right now, my code is as follows: % clear % clc % close all % % load tea.mat; images = Input.X; M = zeros(size(images, 2), size (images, 2)); for i = 1:size(images, 2) for j = 1:size(images, 2) normImageTemp = sqrt((sum((images(:, i) - images(:, j))./256).^2)); %Need to accurately select the value of gamma_i gamma

Similar UTF-8 strings for autocomplete field

为君一笑 提交于 2019-11-28 11:46:40
Background Users can type in a name and the system should match the text, even if the either the user input or the database field contains accented (UTF-8) characters. This is using the pg_trgm module. Problem The code resembles the following: SELECT t.label FROM the_table t WHERE label % 'fil' ORDER BY similarity( t.label, 'fil' ) DESC When the user types fil , the query matches filbert but not filé powder . (Because of the accented character?) Failed Solution #1 I tried to implement an unaccent function and rewrite the query as: SELECT t.label FROM the_table t WHERE unaccent( label ) %

Hamming Distance / Similarity searches in a database

╄→гoц情女王★ 提交于 2019-11-28 09:30:17
I have a process, similar to tineye that generates perceptual hashes, these are 32bit ints. I intend to store these in a sql database (maybe a nosql db) in the future However, I'm stumped at how I would be able to retrieve records based on the similarity of hashes. Any Ideas? A common approach (at least common to me) is to divide your hash bit string in several chunks and query on these chunks for an exact match. This is a "pre-filter" step. You then can perform a bitwise hamming distance computation on the returned results which should be only a smaller subset of your overall dataset. This

String similarity in PHP: levenshtein like function for long strings

筅森魡賤 提交于 2019-11-28 08:41:28
The function levenshtein in PHP works on strings with maximum length 255. What are good alternatives to compute a similarity score of sentences in PHP. Basically I have a database of sentences, and I want to find approximate duplicates. similar_text function is not giving me expected results. What is the easiest way for me to detect similar sentences like below: $ss="Jack is a very nice boy, isn't he?"; $pp="jack is a very nice boy is he"; $ss=strtolower($ss); // convert to lower case as we dont care about case $pp=strtolower($pp); $score=similar_text($ss, $pp); echo "$score %\n"; // Outputs

Javascript text similarity algorithm

不羁的心 提交于 2019-11-28 08:31:48
I'm building a website that should collect various news feeds and would like the texts to be compared for similarity. What i need is some sort of a news text similarity algorithm . I know that php has the similar_text function and am not sure how good it is + i need it for javascript. So if anyone could point me to an example or a plugin or any instruction on how this is possible or at least where to look and start investigating. There's a javascript implementation of the Levenshtein distance metric, which is often used for text comparisons. If you want to compare whole articles or headlines

Compute mean squared, absolute deviation and custom similarity measure - Python/NumPy

家住魔仙堡 提交于 2019-11-28 04:10:14
问题 I have a large image as an 2D array (let's assume that it is a 500 by 1000 pixels gray scale image). And I have one small image (let's say is it 15 by 15 pixels). I would like to slide the small image over the large one and for a given position of the small image I would like to calculate a measure of similarity between the small image and the underling part of the big image. I would like to be flexible in choosing a measure of similarity. For example I might want to calculate mean squared

String similarity score/hash

雨燕双飞 提交于 2019-11-28 03:04:22
Is there a method to calculate something like general "similarity score" of a string? In a way that I am not comparing two strings together but rather I get some number (hash) for each string that can later tell me that two strings are or are not similar. Two similar strings should have similar (close) hashes. Let's consider these strings and scores as an example: Hello world 1000 Hello world! 1010 Hello earth 1125 Foo bar 3250 FooBarbar 3750 Foo Bar! 3300 Foo world! 2350 You can see that Hello world! and Hello world are similar and their scores are close to each other. This way, finding the

Check if two NSStrings are similar

偶尔善良 提交于 2019-11-28 02:04:05
I present a tricky question that I am not sure how to approach. So, I have formulated a plist containing dictionaries which contain two objects: The Country Name The Plug Size Of The Country There are only 210 countries/facts though. And, I have enabled to search through a list of many many countries, in which there might be a fact or not. But here is my problem, I am using a web service called Geonames and the user can use a search bar display controller to search for countries, and these plist country names paired with plug sizes are actually from a Wikipedia article. Now, the country naming

To use iSPARQL to compare values using similarity measures

空扰寡人 提交于 2019-11-28 01:39:10
问题 I have a question for you. I would like to write a query that retrieves the values ​​that are similar (given a function of similarity, such as Lev) to a given string "Londn" to make the comparison with the predicate "RDFS:label" of DBPedia. In Output, for example, I would like to get the value of "London". I have read that a usable approach might be to use iSPARQL ("Imprecise SPARQL") although it is not very widely used in the literature. Can I use iSPARQL or is there some SPARQL approach to

Mahalonobis distance in R, error: system is computationally singular

爷,独闯天下 提交于 2019-11-27 23:04:55
I'd like to calculate multivariate distance from a set of points to the centroid of those points. Mahalanobis distance seems to be suited for this. However, I get an error (see below). Can anyone tell me why I am getting this error, and if there is a way to work around it? If you download the coordinate data and the associated environmental data , you can run the following code. require(maptools) occ <- readShapeSpatial('occurrences.shp') load('envDat.Rdata') #standardize the data to scale the variables dat <- as.matrix(scale(dat)) centroid <- dat[1547,] #let's assume this is the centroid in