Performance of regexp_replace vs translate in Oracle?

后端 未结 2 1470
一整个雨季
一整个雨季 2020-12-19 03:20

For simple things is it better to use the translate function on the premise that it is less CPU intensive or is regexp_replace the way to go?

相关标签:
2条回答
  • 2020-12-19 03:41

    For SQL, I tested this with the following script:

    set timing on
    
    select sum(length(x)) from (
      select translate('(<FIO>)', '()[]', '----') x
      from (
        select *
        from dual
        connect by level <= 2000000
      )
    );
    
    select sum(length(x)) from (
      select regexp_replace('[(<FIO>)]', '[\(\)\[]|\]', '-', 1, 0) x
      from (
        select *
        from dual
        connect by level <= 2000000
      )
    );
    

    and found that the performance of translate and regexp_replace were almost always the same, but it could be that the cost of the other operations is overwhelming the cost of the functions I'm trying to test.

    Next, I tried a PL/SQL version:

    set timing on
    
    declare
      x varchar2(100);
    begin
      for i in 1..2500000 loop
        x := translate('(<FIO>)', '()[]', '----');
      end loop;
    end;
    /
    
    declare
      x varchar2(100);
    begin
      for i in 1..2500000 loop
        x := regexp_replace('[(<FIO>)]', '[\(\)\[]|\]', '-', 1, 0);
      end loop;
    end;
    /
    

    Here the translate version takes just under 10 seconds, while the regexp_replace version around 0.2 seconds -- around 2 orders of magnitude faster(!)

    Based on this result, I will be using regular expressions much more often in my performance critical code -- both SQL and PL/SQL.

    0 讨论(0)
  • 2020-12-19 03:44

    I think you're running into simple optimization. The regexp expression is so expensive to compute that the result is cached in the hope that it will be used again in the future. If you actually use distinct strings to convert, you will see that the modest translate is naturally faster because it is its specialized function.

    Here's my example, running on 11.1.0.7.0:

    SQL> DECLARE
      2     TYPE t IS TABLE OF VARCHAR2(4000);
      3     l       t;
      4     l_level NUMBER := 1000;
      5     l_time  TIMESTAMP;
      6     l_char  VARCHAR2(4000);
      7  BEGIN
      8     -- init
      9     EXECUTE IMMEDIATE 'ALTER SESSION SET PLSQL_OPTIMIZE_LEVEL=2';
     10     SELECT dbms_random.STRING('p', 2000)
     11       BULK COLLECT
     12       INTO l FROM dual
     13     CONNECT BY LEVEL <= l_level;
     14     -- regex
     15     l_time := systimestamp;
     16     FOR i IN 1 .. l.count LOOP
     17        l_char := regexp_replace(l(i), '[]()[]', '-', 1, 0);
     18     END LOOP;
     19     dbms_output.put_line('regex     :' || (systimestamp - l_time));
     20     -- tranlate
     21     l_time := systimestamp;
     22     FOR i IN 1 .. l.count LOOP
     23        l_char := translate(l(i), '()[]', '----');
     24     END LOOP;
     25     dbms_output.put_line('translate :' || (systimestamp - l_time));
     26  END;
     27  /
    
    regex     :+000000000 00:00:00.979305000
    translate :+000000000 00:00:00.238773000
    
    PL/SQL procedure successfully completed
    

    on 11.2.0.3.0 :

    regex     :+000000000 00:00:00.617290000
    translate :+000000000 00:00:00.138205000
    

    Conclusion: In general I suspect translate will win.

    0 讨论(0)
提交回复
热议问题