soundex

Phonetic search for Indian languages

丶灬走出姿态 提交于 2020-12-02 03:18:34
问题 I want to compare strings phonetically in my android app. But the special case here is, I want to compare Indian language words written in English. For example, I want to check if "Edhu" "Adhu" "Yethu" are phonetically equal, they all mean the same in Tamil language. But people who use English script to write Indian languages use different spellings to make the word. How do I compare words in this case? I tried out Levenshtein. But I am not sure how to convert the number it returns to the

Phonetic search for Indian languages

扶醉桌前 提交于 2020-12-02 03:18:03
问题 I want to compare strings phonetically in my android app. But the special case here is, I want to compare Indian language words written in English. For example, I want to check if "Edhu" "Adhu" "Yethu" are phonetically equal, they all mean the same in Tamil language. But people who use English script to write Indian languages use different spellings to make the word. How do I compare words in this case? I tried out Levenshtein. But I am not sure how to convert the number it returns to the

Fuzzy matching a string in in pyspark or SQL using Soundex function or Levenshtein distance

ぐ巨炮叔叔 提交于 2020-03-23 12:03:25
问题 I had to apply Levenshtein Function on last column when passport and country are same. matrix = passport_heck.select(\ f.col('name_id').alias('name_id_1'), f.col('last').alias('last_1'), f.col('country').alias('country_1'), f.col('passport').alias('passport_1')) \ .crossJoin(passport_heck.select(\ f.col('name_id').alias('name_id_2'), f.col('last').alias('last_2'), f.col('country').alias('country_2'), f.col('passport').alias('passport_2')))\ .filter((f.col('passport_1') == f.col('passport_2'))

Soundex with numbers as String parameter

萝らか妹 提交于 2020-01-06 08:46:17
问题 Do you know some explanation why SOUNDEX does not work with NUMBERS as string? These queries works fine: select 1 from dual where soundex('for you') = soundex('for u') ; select 1 from dual where soundex('for you') = soundex('for you') ; But this one doesn´t: select 1 from dual where soundex('6000') = soundex('6000') ; select 1 from dual where soundex('5') = soundex('5') ; I was reading documentation http://docs.oracle.com/cd/E11882_01/server.112/e26088/functions167.htm#SQLRF06109 but does not

Mysql Equivalent of php metaphone and soundex

血红的双手。 提交于 2019-12-23 06:09:12
问题 I am working on an app where user's current playing song title is fetched and we look in the mysql database to see who else is playing a similar song.since the same song might be with many varied titles on everyone's phone , we need a way to effectively find as close results as possible. The process that we are using right now gets all the songs from the table then do a foreach and compare each entry in the resultset with user's song. Here is a part of the function we have used: $all_results

PHP/MySQL: Highlight “SOUNDS LIKE” query results

爱⌒轻易说出口 提交于 2019-12-21 02:40:12
问题 Quick MYSQL/PHP question. I'm using a "not-so-strict" search query as a fallback if no results are found with a normal search query, to the tune of: foreach($find_array as $word) { clauses[] = "(firstname SOUNDS LIKE '$word%' OR lastname SOUNDS LIKE '$word%')"; } if (!empty($clauses)) $filter='('.implode(' AND ', $clauses).')'; $query = "SELECT * FROM table WHERE $filter"; Now, I'm using PHP to highlight the results, like: foreach ($find_array as $term_to_highlight){ foreach ($result as $key

Could use some help with this soundex coding

a 夏天 提交于 2019-12-19 10:13:18
问题 The US census bureau uses a special encoding called “soundex” to locate information about a person. The soundex is an encoding of surnames (last names) based on the way a surname sounds rather than the way it is spelled. Surnames that sound the same, but are spelled differently, like SMITH and SMYTH, have the same code and are filed together. The soundex coding system was developed so that you can find a surname even though it may have been recorded under various spellings. In this lab you

Levenshtein distance based methods Vs Soundex

放肆的年华 提交于 2019-12-17 10:46:29
问题 As per this comment in a related thread, I'd like to know why Levenshtein distance based methods are better than Soundex. 回答1: Soundex is rather primitive - it was originally developed to be hand calculated. It results in a key that can be compared. Soundex works well with western names, as it was originally developed for US census data. It's intended for phonetic comparison. Levenshtein distance looks at two values and produces a value based on their similarity. It's looking for missing or

Is there a soundex function for python?

六眼飞鱼酱① 提交于 2019-12-12 16:15:19
问题 Is there a soundex function for python and if not how would you go about making a soundex code? Soundex Code Letters 1 B, F, P, V 2 C, G, J, K, Q, S, X, Z 3 D, T 4 L 5 M, N 6 R SKIP A, E, H, I, O, U, W, Y, H, W, and Y For example: Jackson = J250 Washington = W252 Clement = C455 Ashcraft = A261 Wu = W000 回答1: You can use jellyfish sudo pip install jellyfish print "Soundex\t\t=", jellyfish.soundex("Ala ma kaca") >Soundex = A452 #... >Metaphone = AL M KK >NYSIIS = AL >Match rating codex = ALMKC

Solr search using contains, sound like

荒凉一梦 提交于 2019-12-10 17:06:29
问题 Problem: I have a movie information in solr. Two string fields define the movie title and director name. A copy field define another field which solr search for default. I would like to have google like search with limited scope as follows. How to achieve it. 1)How to search solr for contains E.g. a) If the movie director name is "John Cream", searching for joh won't return anything. However, searchign for John return the correct result. b) If there is a movie title called aaabbb and another