Oracle Text Contains and technical content

无人久伴 提交于 2019-12-24 10:08:15

问题


I'am searching for the technical word "AN-XYZ99". So I use

SELECT *
FROM foo
WHERE CONTAINS(bar, 'AN{-}XYZ99') > 0

but I get also results like "FO-XYZ99" or "BAR-XYZ99". What can I do to ensure the expected result?

I used

BEGIN
    CTX_DDL.CREATE_PREFERENCE('FOO','BASIC_LEXER');
    CTX_DDL.SET_ATTRIBUTE('FOO', 'ALTERNATE_SPELLING', 'GERMAN');
    CTX_DDL.SET_ATTRIBUTE('FOO', 'COMPOSITE', 'GERMAN');
    CTX_DDL.SET_ATTRIBUTE('FOO', 'MIXED_CASE', 'NO');
END;

Sample data from column "bar" (VARCHAR2(4000)):

"unbekannt Stadt Text: AN-XYZ99 << foobar Straße 31.12.2017 Datum Host 20160101 foo"
"unbekannt Stadt Text: FO-XYZ99 << foobar Straße 31.12.2017 Datum Host 20160101 bar"
"unbekannt Stadt Text: BAR-XYZ99 << foobar Straße 31.12.2017 Datum Host 20160101 bla"

With the Statement above I would like the first row as output but I get the second and third row as well.

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production


回答1:


First you must define hyphen as a printjoin in your lexer.

check it with

select IXV_ATTRIBUTE, IXV_VALUE from CTXSYS.CTX_INDEX_VALUES where IXV_CLASS =  'LEXER';

IXV_ATTRIBUTE                  IXV_VALUE     
-----------------------------------------
PRINTJOINS                     _$%&-         
NUMJOIN                        .              
NUMGROUP                       .              
WHITESPACE                     ,= 

Then you may (after re-creating index with this lexer) validate that the tokens are as expected: (your table would vary based on the index name; check all tables like 'DR$%$I')

select TOKEN_TEXT from DR$TEXTIDX_IDX$I where TOKEN_TEXT like '%-XYZ99';
TOKEN_TEXT                                                     
----------------------------------------------------------------
AN-XYZ99                                                         
BAR-XYZ99                                                        
FO-XYZ99

Now you may query for the search string.

Aparently you must escape the hyphen as BAR-XYZ99 will find rows with BAR not containing XYZ99 ; although the documentation of hyphen with no space is a bit different.

SELECT SCORE(1),txt
FROM textidx
WHERE  CONTAINS(txt, 'BAR-XYZ99',1) > 0; 

  SCORE(1) TXT                                                                                
---------- ------------------------------------------------------------------------------------
         4 unbekannt Stadt Text: FO-XYZ99 << foobar Straße 31.12.2017 Datum Host 20160101 bar

For some reason (I'm on 11.2.0.2.0) the escaping with curly braces doesn't work (returns no match), but using backslash is fine.

SELECT SCORE(1),txt
FROM textidx
WHERE  CONTAINS(txt, 'BAR\-XYZ99',1) > 0;  

  SCORE(1) TXT                                                                                
---------- ------------------------------------------------------------------------------------
         4 unbekannt Stadt Text: BAR-XYZ99 << foobar Straße 31.12.2017 Datum Host 20160101 bla 


来源:https://stackoverflow.com/questions/48061865/oracle-text-contains-and-technical-content

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!