SQL word root matching

夙愿已清 提交于 2021-01-27 07:41:50

问题


I'm wondering whether major SQL engines out there (MS SQL, Oracle, MySQL) have the ability to understand that 2 words are related because they share the same root.

We know it's easy to match "networking" when searching for "network" because the latter is a substring of the former.

But do SQL engines have functions that can match "network" when searching for "networking"?

Thanks a lot.


回答1:


This functionality is called a stemmer: an algorithm that can deduce a stem from any form of the word.

This can be quite complex: for instance, Russian words шёл and иду are different forms of the same verb, though they have not a single common letter (ironically, this is also true for English: went and go).

Word breaking can also be quite a complex task for some languages that use no spaces between words.

SQL Server allows using pluggable stemmers and word breakers for its fulltext search engine:

http://msdn.microsoft.com/en-us/library/ms142509.aspx




回答2:


I think the topic is 'Semantic Similarity'. There are several efforts trying to find optimal solutions to this problem.




回答3:


You can try using soundex, though it might not be exactly what you want. See http://www.codeproject.com/KB/database/Phonetic_Search_MSSQL.aspx.




回答4:


As Quassnoi pointed out, this can be done with stemming. PostgreSQL implements it for full-text search if you turn it on.

ALTER TEXT SEARCH CONFIGURATION blah_en ADD MAPPING FOR english_stem;

This uses the Snowball dictionary, which is based on the Porter stemmer. The Porter stemmer is probably one of the most widely used stemmers, so it will give decent results. It's important to remember, though, that stemming is not always as accurate as you might like.



来源:https://stackoverflow.com/questions/4051572/sql-word-root-matching

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!