fuzzy-comparison

How do I fuzzy match just adjacent cells?

喜夏-厌秋 提交于 2020-06-11 10:00:06
问题 I have a row of 10,000 names in two corresponding columns, 10,000 in each. Each cell in Column A corresponds to the adjacent cell in Column B. I want to do a fuzzy match and get a compatibility score on all of them just with the adjacent cell. I do not want it to search entire column versus entire column, just adjacent cells, which I don't seem to be able to do with the Fuzzy Match Excel add in, ideas? Example: Column A: Column B: Value: Apple Aplle 80% Banana Banana 100% Orange Ornge 85% 回答1

How to merge two pandas DataFrames based on a similarity function?

陌路散爱 提交于 2020-04-10 03:46:51
问题 Given dataset 1 name,x,y st. peter,1,2 big university portland,3,4 and dataset 2 name,x,y saint peter3,4 uni portland,5,6 The goal is to merge on d1.merge(d2, on="name", how="left") There are no exact matches on name though. So I'm looking to do a kind of fuzzy matching. The technique does not matter in this case, more how to incorporate it efficiently into pandas. For example, st. peter might match saint peter in the other, but big university portland might be too much of a deviation that we

Fuzzy Text Search: Regex Wildcard Search Generator?

冷暖自知 提交于 2020-01-31 18:13:07
问题 I'm wondering if there is some kind of way to do fuzzy string matching in PHP. Looking for a word in a long string, finding a potential match even if its mis-spelled; something that would find it if it was off by one character due to an OCR error. I was thinking a regex generator might be able to do it. So given an input of "crazy" it would generate this regex: .*((crazy)|(.+razy)|(c.+azy)|cr.+zy)|(cra.+y)|(craz.+)).* It would then return all matches for that word or variations of that word.

Joining/matching data frames in R

天涯浪子 提交于 2020-01-22 17:02:27
问题 I have two data frames. The first one has two columns: x is water depth, y is temperature at each depth. The second one has two columns too, x is also water depth, but at different depth compared to that in the first table. The second column z is salinity. I want to join the two tables by x , by adding z to the first table. I have learned how to join tables using 'key' in tidyr , but that only works if the keys are identical. The x in these two tables are not the same. What I want to do is to

Joining/matching data frames in R

此生再无相见时 提交于 2020-01-22 17:02:10
问题 I have two data frames. The first one has two columns: x is water depth, y is temperature at each depth. The second one has two columns too, x is also water depth, but at different depth compared to that in the first table. The second column z is salinity. I want to join the two tables by x , by adding z to the first table. I have learned how to join tables using 'key' in tidyr , but that only works if the keys are identical. The x in these two tables are not the same. What I want to do is to

Lucene.net Fuzzy Phrase Search

感情迁移 提交于 2020-01-13 13:45:11
问题 I have tried this myself for a considerable period and looked everywhere around the net - but have been unable to find ANY examples of Fuzzy Phrase searching via Lucene.NET 2.9.2. ( C# ) Is something able to advise how to do this in detail and/or provide some example code - I would seriously seriously appreciate any help as I am totally stuck ? 回答1: I assume that you have Lucene running and created a search index with some fields in it. So let's assume further that: var fields = ... // a

SQL Fuzzy Join - MSSQL

天大地大妈咪最大 提交于 2020-01-12 10:20:25
问题 I have two sets of data. Existing customers and potential customers. My main objective is to figure out if any of the potential customers are already existing customers. However, the naming conventions of customers across data sets are inconsistent. EXISTING CUSTOMERS Customer / ID Ed's Barbershop / 1002 GroceryTown / 1003 Candy Place / 1004 Handy Man / 1005 POTENTIAL CUSTOMERS Customer Eds Barbershop Grocery Town Candy Place Handee Man Beauty Salon The Apple Farm Igloo Ice Cream Ride-a-Long

Python regex module fuzzy match: substitution count not as expected

喜你入骨 提交于 2020-01-04 18:42:32
问题 Background The Python module regex allows fuzzy matching. You can specify the allowable number of substitutions (s), insertions (i), deletions (d), and total errors (e). The fuzzy_counts property of a match result returns a tuple (0,0,0), where: match.fuzzy_counts[0] = count for 's' match.fuzzy_counts[1] = count for 'i' match.fuzzy_counts[2] = count for 'd' Problem The deletions and insertions are counted as expected, but not the substitutions. In the example below, the only change is a

How to group / compare similar news articles

时间秒杀一切 提交于 2019-12-18 11:14:14
问题 In an app that i'm creating, I want to add functionality that groups news stories together. I want to group news stories about the same topic from different sources into the same group. For example, an article on XYZ from CNN and MSNBC would be in the same group. I am guessing its some sort of fuzzy logic comparison. How would I go about doing this from a technical standpoint? What are my options? We haven't even started the app yet, so we aren't limited in the technologies we can use. Thanks

Fuzzy regular expressions

三世轮回 提交于 2019-12-18 10:27:30
问题 I am looking for a way to do a fuzzy match using regular expressions. I'd like to use Perl, but if someone can recommend any way to do this that would be helpful. As an example, I want to match a string on the words "New York" preceded by a 2-digit number. The difficulty comes because the text is from OCR of a PDF, so I want to do a fuzzy match. I'd like to match: 12 New York 24 Hew York 33 New Yobk and other "close" matches (in the sense of the Levenshtein distance), but not: aa New York 11