Oracle search text with non-english characters

陌路散爱 提交于 2019-12-10 14:59:52

问题


Our Oracle DB is UTF8. We are storing addresses that need to be searchable. Some of the street names contain non-english characters (e.g. Peña Báináõ ) this needs to be searchable either as "Peña Báináõ" or with english equivalent charactes like "Pena Bainao". What we did is to convert the text on the query, something like:

SELECT CONVERT('Peña Báináõ','US7ASCII') as converted FROM dual;

But the issue here is that not all of the characters have an English equivalent (not even some pretty obvious ones like ñ or õ) so we end up with the text converted to:

Pe?a Baina?

So if the user tries to find that addres typing "Pena Bainao" he can't find it because "Pena Bainao" is different from ""Pe?a Baina?"".

We have figured out some dirty workarrounds on this, but I wanted to check first if someone has found a more elegant solution.

Here is a list of some characters that are not converted to US7ASCII:

Character     UTF8 Code     Possible Equivalent   
æ         -   u00E6      -      ae
å         -   u00E5      -       a
ã         -   u00E3      -       a
ñ         -   u00F1      -       n
õ         -   u00F5      -       o

回答1:


1) Using nlssort with BINARY_AI (Both case and accent insentive):

SQL> select nlssort('Peña Báináõ', 'NLS_SORT = BINARY_AI') C from dual;

C
------------------------
70656E61206261696E616F00

SQL> select nlssort('Pena Bainao', 'NLS_SORT = BINARY_AI') C from dual;

C
------------------------
70656E61206261696E616F00

SQL> select nlssort('pena bainao', 'NLS_SORT = BINARY_AI') C from dual;

C
------------------------
70656E61206261696E616F00

SQL> select 'true' T from dual where nlssort('pena bainao', 'NLS_SORT = BINARY_AI') = nlssort('Peña Báináõ', 'NLS_SORT = BINARY_AI') ;

T
----
true

2) You could also alter the NLS_SORT session variable to binary_ai and then you would not have to specify NLS_SORT every time:

SQL> select 'true' T from dual where nlssort('pena bainao') = nlssort('Peña Báináõ') ;

no rows selected

SQL> alter session set nls_sort = binary_ai;

Session altered.

SQL> select 'true' T from dual where nlssort('pena bainao') = nlssort('Peña Báináõ') ;

T
----
true

3) To drop the use of nlssort function and change the sematics of everything, also set the nls_comp session variable:

SQL> select 'true' T from dual where 'pena bainao' = 'Peña Báináõ';

no rows selected

SQL> alter session set nls_comp = linguistic;

Session altered.

SQL> select 'true' T from dual where 'pena bainao' = 'Peña Báináõ';

T
----
true

Option 1 changes only local behavior, the query where you want different results. Option 2 and 3 will change behavior of other queries and may not be what you want. See Table 5-2 of Oracle® Database Globalization Support Guide. Also look the section "Using Linguistic Indexes" to see how to be able to use indexes.



来源:https://stackoverflow.com/questions/6682173/oracle-search-text-with-non-english-characters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!