fuzzy-search

Super fuzzy name checking?

我的未来我决定 提交于 2019-12-03 03:19:24
问题 I'm working on some stuff for an in-house CRM. The company's current frontend allows for lots of duplicates. I'm trying to stop end-users from putting in the same person because they searched for 'Bill Johnson' and not 'William Johnson.' So the user will put in some information about their new customer and we'll find the similar names (including fuzzy names) and match them against what is already in our database and ask if they meant those things... Does such a database or technology exist?

Google fuzzy search (a.k.a “suggestions”): What technique(s) are in use?

自闭症网瘾萝莉.ら 提交于 2019-12-03 03:18:13
问题 I'm implementing search suggestion functionality in my web-app, and have been looking at existing implementations for techniques in use. It seems as though most of the major sites (Amazon, Bing, etc.) implement fuzzy search in the following way: Tokenize search string in to terms processingSearchStringSet = {} For each term if exact term is NOT in index Get possible terms (fuzzyTerms) from levenshtein(term, 1 (or 2)) For each term in fuzzyTerms if term is in index processingSearchStringSet

Is it possible to perform T-SQL fuzzy lookup without SSIS?

半城伤御伤魂 提交于 2019-12-03 03:01:15
SSIS 2005/2008 does fuzzy lookups and groupings. Is there a feature that does the same in T-SQL? Fuzzy lookup uses a q-gram approach, by breaking strings up into tiny sub-strings and indexing them. You can then then search input by breaking it up into equally sized strings. You can inspect the format of their index and write a CLR function to use the same style of index but you might be talking about a fair chunk of work. It is actually quite interesting how they did it, very simple yet provides very robust matching and is very configurable. From that I recall of the index when I last looked

Fuzzy Text Matching C#

两盒软妹~` 提交于 2019-12-03 02:42:51
I'm writing a desktop UI (.Net WinForms) to assist a photographer clean up his image meta data. There is a list of 66k+ phrases. Can anyone suggest a good open source/free .NET component I can use that employs some sort of algorithm to identify potential candiates for consolidation? For example there may be two or more entries which are actually the same word or phrase that only differ by whitespace or punctuation or even slight mis-spelling. The application will ultimately rely on the user to action the consolidation of phrases but having an effective way to automatically find potential

q-gram approximate matching optimisations

99封情书 提交于 2019-12-03 00:37:48
I have a table containing 3 million people records on which I want to perform fuzzy matching using q-grams (on surname for instance). I have created a table of 2-grams linking to this, but search performance is not great on this data volume (around 5 minutes). I basically have two questions: (1) Can you suggest any ways to improve performance to avoid a table scan (i.e. having to count common q-grams between the search string and 3 million surnames) (2) With q-grams, if A is similar to B and C is similar to B, does it imply C is similar to A? Kind regards Peter I've been looking into fuzzy

ElasticSearch's Fuzzy Query

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-02 21:26:29
I am brand new to ElasticSearch , and am currently exploring its features. One of them I am interested in is the Fuzzy Query , which I am testing and having troubles to use. It is probably a dummy question so I guess someone who already used this feature will quickly find the answer, at least I hope. :) BTW I have the feeling that it might not be only related to ElasticSearch but maybe directly to Lucene . Let's start with a new index named "first index" in which I store an object "label" with value "american football". This is the query I use. bash-3.2$ curl -XPOST 'http://localhost:9200

Solr Fuzzy Search for similar words

陌路散爱 提交于 2019-12-02 18:25:29
I am trying to do a fuzzy search for "jahngir" ~ 0.2, which does not return any results. My indexes has records with data "JAHANGIR RAHMAN MD". If I try a search with exact word "jahangir" ~ 0.2, it works. Can someone please help, on what I am doing wrong. I have spent a lot of time trying to figure out on how the Solr Fuzzy search works. Any links which explain Solr Fuzzy search would be helpful. Below is the text field that I am using for indexing. Thanks in advance. <fieldType name="text" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true"> <analyzer type=

Super fuzzy name checking?

你离开我真会死。 提交于 2019-12-02 17:46:28
I'm working on some stuff for an in-house CRM. The company's current frontend allows for lots of duplicates. I'm trying to stop end-users from putting in the same person because they searched for 'Bill Johnson' and not 'William Johnson.' So the user will put in some information about their new customer and we'll find the similar names (including fuzzy names) and match them against what is already in our database and ask if they meant those things... Does such a database or technology exist? I implemented such a functionality on one website. I use double_metaphone() + levenstein() in PHP. I

SQL Server Fuzzy Search with Percentage of match

旧街凉风 提交于 2019-12-02 03:31:55
问题 I am using SQL Server 2008 R2 SP1. I have a table with about 36034 records of customers. I am trying to implement Fuzy search on Customer Name field. Here is Function for Fuzzy Search ALTER FUNCTION [Party].[FuzySearch] ( @Reference VARCHAR(200) , @Target VARCHAR(200) ) RETURNS DECIMAL(5, 2) WITH SCHEMABINDING AS BEGIN DECLARE @score DECIMAL(5, 2) SELECT @score = CASE WHEN @Reference = @Target THEN CAST(100 AS NUMERIC(5, 2)) WHEN @Reference IS NULL OR @Target IS NULL THEN CAST(0 AS NUMERIC(5,

SQL Server Fuzzy Search with Percentage of match

限于喜欢 提交于 2019-12-02 01:14:43
I am using SQL Server 2008 R2 SP1. I have a table with about 36034 records of customers. I am trying to implement Fuzy search on Customer Name field. Here is Function for Fuzzy Search ALTER FUNCTION [Party].[FuzySearch] ( @Reference VARCHAR(200) , @Target VARCHAR(200) ) RETURNS DECIMAL(5, 2) WITH SCHEMABINDING AS BEGIN DECLARE @score DECIMAL(5, 2) SELECT @score = CASE WHEN @Reference = @Target THEN CAST(100 AS NUMERIC(5, 2)) WHEN @Reference IS NULL OR @Target IS NULL THEN CAST(0 AS NUMERIC(5, 2)) ELSE ( SELECT [Score %] = CAST(SUM(LetterScore) * 100.0 / MAX(WordLength * WordLength) AS NUMERIC