How to convert comma separated values to rows in oracle?

前端 未结 4 1392
死守一世寂寞
死守一世寂寞 2020-11-22 07:55

Here is the DDL --

create table tbl1 (
   id number,
   value varchar2(50)
);

insert into tbl1 values (1, \'AA, UT, BT, SK, SX\');
insert into tbl1 values (         


        
4条回答
  •  挽巷
    挽巷 (楼主)
    2020-11-22 08:06

    I agree that this is a really bad design. Try this if you can't change that design:

    select distinct id, trim(regexp_substr(value,'[^,]+', 1, level) ) value, level
      from tbl1
       connect by regexp_substr(value, '[^,]+', 1, level) is not null
       order by id, level;
    

    OUPUT

    id value level
    1   AA  1
    1   UT  2
    1   BT  3
    1   SK  4
    1   SX  5
    2   AA  1
    2   UT  2
    2   SX  3
    3   UT  1
    3   SK  2
    3   SX  3
    3   ZF  4
    

    Credits to this

    To remove duplicates in a more elegant and efficient way (credits to @mathguy)

    select id, trim(regexp_substr(value,'[^,]+', 1, level) ) value, level
      from tbl1
       connect by regexp_substr(value, '[^,]+', 1, level) is not null
          and PRIOR id =  id 
          and PRIOR SYS_GUID() is not null  
       order by id, level;
    

    If you want an "ANSIer" approach go with a CTE:

    with t (id,res,val,lev) as (
               select id, trim(regexp_substr(value,'[^,]+', 1, 1 )) res, value as val, 1 as lev
                 from tbl1
                where regexp_substr(value, '[^,]+', 1, 1) is not null
                union all           
                select id, trim(regexp_substr(val,'[^,]+', 1, lev+1) ) res, val, lev+1 as lev
                  from t
                  where regexp_substr(val, '[^,]+', 1, lev+1) is not null
                  )
    select id, res,lev
      from t
    order by id, lev;
    

    OUTPUT

    id  val lev
    1   AA  1
    1   UT  2
    1   BT  3
    1   SK  4
    1   SX  5
    2   AA  1
    2   UT  2
    2   SX  3
    3   UT  1
    3   SK  2
    3   SX  3
    3   ZF  4
    

    Another recursive approach by MT0 but without regex:

    WITH t ( id, value, start_pos, end_pos ) AS
      ( SELECT id, value, 1, INSTR( value, ',' ) FROM tbl1
      UNION ALL
      SELECT id,
        value,
        end_pos                    + 1,
        INSTR( value, ',', end_pos + 1 )
      FROM t
      WHERE end_pos > 0
      )
    SELECT id,
      SUBSTR( value, start_pos, DECODE( end_pos, 0, LENGTH( value ) + 1, end_pos ) - start_pos ) AS value
    FROM t
    ORDER BY id,
      start_pos;
    

    I've tried 3 approaches with a 30000 rows dataset and 118104 rows returned and got the following average results:

    • My recursive approach: 5 seconds
    • MT0 approach: 4 seconds
    • Mathguy approach: 16 seconds
    • MT0 recursive approach no-regex: 3.45 seconds

    @Mathguy has also tested with a bigger dataset:

    In all cases the recursive query (I only tested the one with regular substr and instr) does better, by a factor of 2 to 5. Here are the combinations of # of strings / tokens per string and CTAS execution times for hierarchical vs. recursive, hierarchical first. All times in seconds

    • 30,000 x 4: 5 / 1.
    • 30,000 x 10: 15 / 3.
    • 30,000 x 25: 56 / 37.
    • 5,000 x 50: 33 / 14.
    • 5,000 x 100: 160 / 81.
    • 10,000 x 200: 1,924 / 772

提交回复
热议问题