问题
What is the most efficient way to implement a phonetic search in C++ and/or Java? By phonetic search I mean substituting vowels or consonants that sound similar. This would be especially useful for names because sometimes people's names have sort of strange spellings.
I am thinking it might be effective to substitue vowels and some consonants. It may also be good to include some special cases like silent E's at the end or F and PH. Would it be best to use cstrings or strings in C++? Would it be better to store a copy in memory with the substituted values or call a function every time we look for something?
回答1:
Soundex along with its variants is the standard algorithm for this. It uses phonetic rules to transform the name into an alphanumeric code. Names with the same code are grouped together.
As far as implementing the search, I'd use a data structure that maps each soundex code to the list of names that have that code. Depending on the data structure used (a hash table or a tree), the lookup could be done in time that is either constant on logarithmic in the number of distinct soundex codes.
I am not sure what exactly you mean by cstring
(Microsoft's CString
?) but the standard std::string
class will be perfectly fine for this problem and would be my preferred choice.
回答2:
Besides Soundex you'll find also the Metaphone or Double Metaphone phonetic algorithm, which seem to be an improvement for the English pronunciation and is a quite new algorithm.
For the german pronunciation I use the "Kölner Phonetik".
Apache Commons Codec gives you a very simple Java implementation of those basic algorithms (Soundex, Metaphone, ...) http://commons.apache.org/codec/ For example see the javadoc for the soundex: http://commons.apache.org/codec/apidocs/org/apache/commons/codec/language/Soundex.html
Just by typing following code you the the phonetic value of your String:
Soundex soundex = new Soundex();
String phoneticValue = soundex.encode("YourString");
And then you can simply do it for two strings and compare the phonetic values. Hava a look at the following post if you're comparing two strings, because the equals() methods is just black and white, and maybe you'd like to know how many % it is matching:
How to compare almost similar Strings in Java? (String distance measure)
来源:https://stackoverflow.com/questions/8359595/the-most-efficient-way-to-implement-a-phonetic-search