UTL_MATCH-like function to work with CLOB

泄露秘密 提交于 2019-12-05 20:37:49
APC

UTL_MATCH is a packaging for comparing strings with regards for checking how similar two strings are. Its functions evaluate strings and return scores. So all you're going to get is a number indicating (say) how many edits you need to turn ${variableName} into "Farmville" or "StackOveflow".

What you won't get is the actual differences: these two strings of text are identical except at offset 123 where it replaces ${variableName} with "Farmville".

Putting it like that suggests an alternative approach. Using INSTR() and SUBSTR() to locate instances of ${variableName} in your Domo CenterView queries and use those offsets to identify the different text in the v$sql.fulltext equivalents. You can do this with CLOB in PL/SQL with the DBMS_LOB package.

If the text you want to search has length <= 32767, then you can just convert the CLOB to VARCHAR2 using DBMS_LOB.SUBSTR:

select v.sql_fulltext 
from v$sql v 
where utl_match.jaro_winkler_similarity(dbms_lob.substr(v.sql_fulltext), 'select department.dept_name from department where department.id = ''${selectedDepartmentId}''') > 90 ;

I ended up creating a custom function for it. Here's the code:

CREATE OR REPLACE function match_clob(clob_1 clob, clob_2 clob) return number as

similar number := 0;
sec_similar number := 0;
sections number := 0;
max_length number := 3949;
length_1 number;
length_2 number;
vchar_1 varchar2 (3950);
vchar_2 varchar2 (3950);

begin
  length_1 := length(clob_1);
  length_2 := length(clob_2);
  --dbms_output.put_line('length_1: '||length_1);
  --dbms_output.put_line('length_2: '||length_2);
  IF length_1 > max_length or length_2 > max_length THEN

    FOR x IN 1 .. ceil(length_1 / max_length) LOOP

      --dbms_output.put_line('((x-1)*max_length) + 1'||(x-1)||' * '||max_length||' = '||(((x-1)*max_length) + 1));

      vchar_1 := substr(clob_1, ((x-1)*max_length) + 1, max_length);
      vchar_2 := substr(clob_2, ((x-1)*max_length) + 1, max_length);

--      dbms_output.put_line('Section '||sections||' vchar_1: '||vchar_1||' ==> vchar_2: '||vchar_2);

      sec_similar := UTL_MATCH.JARO_WINKLER_SIMILARITY(vchar_1, vchar_2);

      --dbms_output.put_line('sec_similar: '||sec_similar);

      similar := similar + sec_similar;
      sections := sections + 1;

    END LOOP;

    --dbms_output.put_line('Similar: '||similar||' ==> Sections: '||sections);
    similar := similar / sections;

  ELSE
    similar := UTL_MATCH.JARO_WINKLER_SIMILARITY(clob_1,clob_2);
  END IF;
  --dbms_output.put_line('Overall Similar: '||similar);
   return(similar);
end;
/
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!