Oracle query to find all occurrences of a charcter in a string

后端 未结 3 1919
情书的邮戳
情书的邮戳 2020-12-19 07:51

I have to write an Oracle query in toad to find all the occurrences of a character in a string. For example if I\'m searching for R in the string SSSRNNSR

相关标签:
3条回答
  • 2020-12-19 08:28

    This is a solution:

    select
      pos
    from
      (select
        substr('SSSRNNSRSSR', level, 1) as character,
        level as pos
      from
        dual
      connect by
        level <= length(t.text))
    where
      character = 'R'
    

    dual is a built in table that just returns a single row. Very convenient!

    connect by lets you build recursive queries. This is often used to generate lists from tree-like data (parent/child relations). It allows you to more or less repeat the query in front of it. And you've got special fields, like level that allows you to check how deeply the recursion went.

    In this case, I use it to split the string to characters and return a row for each character. Using level, I can repeat the query and get a character until the end of the string is reached.

    Then it is just a matter of returning the pos for all rows containing the character 'R'

    0 讨论(0)
  • 2020-12-19 08:41

    Extending GolezTrol's answer you can use regular expressions to significantly reduce the number of recursive queries you do:

     select instr('SSSRNNSRSSR','R', 1, level)
       from dual
    connect by level <= regexp_count('SSSRNNSRSSR', 'R')
    

    REGEXP_COUNT() returns the number of times the pattern matches, in this case the number of times R exists in SSSRNNSRSSR. This limits the level of recursion to the exact number you need to.

    INSTR() simply searches for the index of R in your string. level is the depth of the recursion but in this case it's also the level th occurrence of the string as we restricted to the number of recurses required.

    If the string you're wanting to pick out is more complicated you could go for regular expressions ans REGEXP_INSTR() as opposed to INSTR() but it will be slower (not by much) and it's unnecessary unless required.


    Simple benchmark as requested:

    The two CONNECT BY solutions would indicate that using REGEXP_COUNT is 20% quicker on a string of this size.

    SQL> set timing on
    SQL>
    SQL> -- CONNECT BY with REGEX
    SQL> declare
      2     type t__num is table of number index by binary_integer;
      3     t_num t__num;
      4  begin
      5    for i in 1 .. 100000 loop
      6       select instr('SSSRNNSRSSR','R', 1, level)
      7         bulk collect into t_num
      8         from dual
      9      connect by level <= regexp_count('SSSRNNSRSSR', 'R')
     10              ;
     11     end loop;
     12  end;
     13  /
    
    PL/SQL procedure successfully completed.
    
    Elapsed: 00:00:03.94
    SQL>
    SQL> -- CONNECT BY with filter
    SQL> declare
      2     type t__num is table of number index by binary_integer;
      3     t_num t__num;
      4  begin
      5    for i in 1 .. 100000 loop
      6       select pos
      7         bulk collect into t_num
      8         from ( select substr('SSSRNNSRSSR', level, 1) as character
      9                     , level as pos
     10                  from dual t
     11               connect by level <= length('SSSRNNSRSSR') )
     12        where character = 'R'
     13              ;
     14     end loop;
     15  end;
     16  /
    
    PL/SQL procedure successfully completed.
    
    Elapsed: 00:00:04.80
    

    The pipelined table function is a fair bit slower, though it would be interesting to see how it performs over large strings with lots of matches.

    SQL> -- PIPELINED TABLE FUNCTION
    SQL> declare
      2     type t__num is table of number index by binary_integer;
      3     t_num t__num;
      4  begin
      5    for i in 1 .. 100000 loop
      6       select *
      7         bulk collect into t_num
      8         from table(string_indexes('SSSRNNSRSSR','R'))
      9              ;
     10     end loop;
     11  end;
     12  /
    
    PL/SQL procedure successfully completed.
    
    Elapsed: 00:00:06.54
    
    0 讨论(0)
  • 2020-12-19 08:49

    To take up a_horse_with_no_name's challenge here is another answer with a pipelined table function.

    A pipelined function returns an array, which you can query normally. I would expect that over strings with large numbers of matches this will perform better than the recursive query but as with everything test yourself first.

    create type num_array as table of number
    /
    
    
    create function string_indexes (
        PSource_String in varchar2
      , PSearch_String in varchar2
        ) return num_array pipelined is
    
    begin
    
       for i in 1 .. length(PSource_String) loop
    
          if substr(PSource_String, i, 1) = PSearch_String then
             pipe row(i);
          end if;
    
       end loop;
    
       return;
    
    end;
    /
    

    Then in order to access it:

    select *
      from table(string_indexes('SSSRNNSRSSR','R'))
    

    SQL Fiddle

    0 讨论(0)
提交回复
热议问题