similarity

Visual similarity search algorithm

怎甘沉沦 提交于 2019-12-04 20:33:13
问题 I'm trying to build a utility like this http://labs.ideeinc.com/multicolr, but I don't know which algorithm they are using, Does anyone know? 回答1: All they are doing is matching histograms. So build a histogram for your images. Normalize the histograms by size of image. A histogram is a vector with as many elements as colors. You don't need 32,24, and maybe not even 16 bits of accuracy and this will just slow you down. For performance reasons, I would map the histograms down to 4, 8, and 10

How do I group similar strings in R? [closed]

孤人 提交于 2019-12-04 18:47:16
I have a database with ~5,000 locality names, most of which are repetitions with typos, permutations, abreviations, etc. I would like to group them by similarity, to speed up further processing. The best would be to convert each variation into a "platonic form", and put two columns side by side, with the original and platonic forms. I've read about Multiple sequence alignment , but this seems to be mostly used in bioinformatics, for sequences of DNA/RNA/Peptides. I'm not sure it will work well with names of places. Anyone knows of a library that helps me to do it in R? Or which of the many

Matching two series of Mfcc coefficients

落花浮王杯 提交于 2019-12-04 16:25:40
I have extracted two series MFCC coefficients from two around 30 second audio files consisting of the same speech content. The audio files are recorded at the same location from different sources. An estimation should be made whether the audio contains the same conversation or a different conversation. Currently I have tested a correlation calculation of the two Mfcc series but the result is not very reasonable. Are there best practices for this scenario? I had the same problem and the solution for it was to match the two arrays of MFCCs using the Dynamic Time Warping algorithm. After

How do you measure similarity between 2 series of data?

谁说胖子不能爱 提交于 2019-12-04 15:17:33
问题 I need to find a similarity measurement between two arrays of data. You can call similarity measurement whatever you want, difference, correlation or whatever. For example: 1, 2, 3, 4, 5 < Series 1 2, 3, 4, 5, 6 < Series 2 Should be far more similar to each other than these 2 series: 1, 2, 3, 4, 5 < Series 1 1, 1, 5, 8, 7 < Series 2 Any suggestions? Is there a source code available for it? 回答1: You can calculate the sample Pearson product-moment correlation coefficient: "The above formula

MySQL Query to find most similar numerical row

[亡魂溺海] 提交于 2019-12-04 15:02:19
In a MySQL database, I am attempting to find the most similar row across a number of numerical attributes. This problem is similar to this question but includes a flexible number of comparisons and a join table. Database The database consists of two tables. The first table, users, is what I'm trying to compare. id | self_ranking ---------------------------------- 1 | 9 2 | 3 3 | 2 The second table is a series of scores which the user gave to particular items. id | user_id | item_id | score ---------------------------------- 1 | 1 | 1 | 4 2 | 1 | 2 | 5 3 | 1 | 3 | 8 4 | 1 | 4 | 3 Task I want to

Is there an alternative to `difflib.get_close_matches()` that returns indexes (list positions) instead of a str list?

血红的双手。 提交于 2019-12-04 13:19:18
I want to use something like difflib.get_close_matches but instead of the most similar strings, I would like to obtain the indexes (i.e. position in the list). The indexes of the list are more flexible because one can relate the index to other data structures (related to the matched string). For example, instead of: >>> words = ['hello', 'Hallo', 'hi', 'house', 'key', 'screen', 'hallo', 'question', 'format'] >>> difflib.get_close_matches('Hello', words) ['hello', 'hallo', 'Hallo'] I would like: >>> difflib.get_close_matches('Hello', words) [0, 1, 6] There doesn't seem to exist a parameter to

Find the similarity between two string columns of a DataFrame

喜夏-厌秋 提交于 2019-12-04 10:12:18
I am new to programming.I have a pandas data frame in which two string columns are present. Data frame is like below: Col-1 Col-2 Update have a account Account account summary AccountDTH Cancel Balance Balance Summary Credit Card Update credit card Here i need to check the similarity of Col-2 elements with each element of Col-1. It Means i have to compare have a account with all the elements of Col-1 . Then find the top 3 similar one. Suppose the similarity scores are : Account(85),AccountDTH(80),Balance(60),Update(45),Credit Card(35) . Expected Output is: Col-2 Output have a account Account

Checking and preventing similar strings while insertion in MySQL

只谈情不闲聊 提交于 2019-12-04 07:17:40
Brief info I have 3 tables: Set: id name SetItem: set_id item_id position TempSet: id I have a function that generates new random combinations from Item table. Basically, always after successful generation, I create a new row in Set table, get it's id and add all item ids into SetItem table. Problem Every time before generating new combination I truncate the TempSet table, fill new item ids into this table and check for similarity percentage by comparing with previous combinations in SetItem table. if new combination similarity greater or equal to 30%, I need to prevent this combination and re

How to get pair-wise “sequence similarity score” for ~1000 proteins?

喜你入骨 提交于 2019-12-04 06:23:41
I have a large number of protein sequences in fasta format. I want to get the pair-wise sequence similarity score for each pairs of the proteins. Any package in R could be used to get the blast similarity score for protein sequences? As per Chase's suggestion, bioconductor is indeed the way to go and in particular the Biostrings package. To install the latter I would suggest installing the core bioconductor library as such: source("http://bioconductor.org/biocLite.R") biocLite() This way you will cover all dependencies. Now, to align 2 protein sequences or any two sequences for that matter you

How to perform trigram operations in Google BigQuery?

谁说胖子不能爱 提交于 2019-12-04 04:44:18
问题 I do use the pg_trgm module in PostgreSQL to calculate similarity between two strings using trigrams. Particularly I use: similarity(text, text) Which returns returns a number that indicates how similar the two arguments are (between 0 and 1). How can I perform similarity function (or equivalent) on Google BigQuery? 回答1: Try below. At least as a blueprint for enhancing SELECT text1, text2, similarity FROM JS( // input table ( SELECT * FROM (SELECT 'mikhail' AS text1, 'mikhail' AS text2),