How to remove duplicates from comma separated list by regexp_replace in Oracle?

前端 未结 2 1741
长发绾君心
长发绾君心 2020-12-11 14:07

I have

 POW,POW,POWPRO,PRO,PRO,PROUTL,TNEUTL,TNEUTL,UTL,UTLTNE,UTL,UTLTNE

I want

POW,POWPRO,PRO,PROUTL,TNEUTL,UTL,UTLTNE
<         


        
相关标签:
2条回答
  • 2020-12-11 14:52

    Two solutions that use only SQL and a third solution that uses a small/simple PL/SQL function which makes for a very short final SQL query.

    Oracle Setup:

    CREATE TABLE data ( value ) AS
    SELECT 'POW,POW,POWPRO,PRO,PRO,PROUTL,TNEUTL,TNEUTL,UTL,UTLTNE,UTL,UTLTNE' FROM DUAL;
    
    CREATE TYPE stringlist AS TABLE OF VARCHAR2(4000);
    /
    

    Query 1:

    SELECT LISTAGG( t.COLUMN_VALUE, ',' ) WITHIN GROUP ( ORDER BY t.COLUMN_VALUE ) AS list
    FROM   data d,
           TABLE(
             SET(
               CAST(
                 MULTISET(
                  SELECT REGEXP_SUBSTR( d.value, '[^,]+', 1, LEVEL )
                  FROM   DUAL
                  CONNECT BY LEVEL <= REGEXP_COUNT( d.value, '[^,]+' )
                 ) AS stringlist
               )
             )
           ) t
    GROUP BY d.value;
    

    Outputs:

    LIST
    ---------------------------------------
    POW,POWPRO,PRO,PROUTL,TNEUTL,UTL,UTLTNE
    

    Query 2:

    SELECT ( SELECT LISTAGG(  COLUMN_VALUE, ',' ) WITHIN GROUP ( ORDER BY ROWNUM )
             FROM TABLE( d.uniques ) ) AS list
    FROM   (
      SELECT ( SELECT CAST(
                        COLLECT(
                          DISTINCT
                          REGEXP_SUBSTR( d.value, '[^,]+', 1, LEVEL )
                        )
                        AS stringlist
                      )
                FROM  DUAL
                CONNECT BY LEVEL <= REGEXP_COUNT( d.value, '[^,]+' )
             ) uniques
      FROM   data d
    ) d;
    

    Output:

    LIST
    ---------------------------------------
    POW,POWPRO,PRO,PROUTL,TNEUTL,UTL,UTLTNE
    

    Oracle Setup:

    A small helper function:

    CREATE FUNCTION split_String(
      i_str    IN  VARCHAR2,
      i_delim  IN  VARCHAR2 DEFAULT ','
    ) RETURN stringlist DETERMINISTIC
    AS
      p_result       stringlist := stringlist();
      p_start        NUMBER(5) := 1;
      p_end          NUMBER(5);
      c_len CONSTANT NUMBER(5) := LENGTH( i_str );
      c_ld  CONSTANT NUMBER(5) := LENGTH( i_delim );
    BEGIN
      IF c_len > 0 THEN
        p_end := INSTR( i_str, i_delim, p_start );
        WHILE p_end > 0 LOOP
          p_result.EXTEND;
          p_result( p_result.COUNT ) := SUBSTR( i_str, p_start, p_end - p_start );
          p_start := p_end + c_ld;
          p_end := INSTR( i_str, i_delim, p_start );
        END LOOP;
        IF p_start <= c_len + 1 THEN
          p_result.EXTEND;
          p_result( p_result.COUNT ) := SUBSTR( i_str, p_start, c_len - p_start + 1 );
        END IF;
      END IF;
      RETURN p_result;
    END;
    /
    

    Query 3:

    SELECT ( SELECT LISTAGG(  COLUMN_VALUE, ',' ) WITHIN GROUP ( ORDER BY ROWNUM )
             FROM TABLE( SET( split_String( d.value ) ) ) ) AS list
    FROM   data d;
    

    or (if you only want to pass a single value):

    SELECT LISTAGG(  COLUMN_VALUE, ',' ) WITHIN GROUP ( ORDER BY ROWNUM ) AS list
    FROM   TABLE( SET( split_String(
              'POW,POW,POWPRO,PRO,PRO,PROUTL,TNEUTL,TNEUTL,UTL,UTLTNE,UTL,UTLTNE'
           ) ) );
    

    Output:

    LIST
    ---------------------------------------
    POW,POWPRO,PRO,PROUTL,TNEUTL,UTL,UTLTNE
    
    0 讨论(0)
  • 2020-12-11 15:11

    The solution offered below uses straight SQL (no PL/SQL). It works with any possible input string, and it removes duplicates in place - it keeps the order of input tokens, whatever that order is. It also removes consecutive commas (it "deletes nulls" from the input string) while treating null inputs correctly. Notice the output for an input string consisting of commas only, and the correct treatment of "tokens" consisting of two spaces and one space respectively.

    The query runs relatively slowly; if performance is an issue, it can be re-written as a recursive query, using "traditional" substr and instr which are quite a bit faster than regular expressions.

    with inputs (input_string) as (
           select 'POW,POW,POWPRO,PRO,PRO,PROUTL,TNEUTL,TNEUTL,UTL,UTLTNE,UTL,UTLTNE' from dual
           union all
           select null from dual
           union all
           select 'ab,ab,st,ab,st,  , ,  ,x,,,r' from dual
           union all
           select ',,,' from dual
         ),
         tokens (input_string, rk, token) as (
           select     input_string, level, 
                      regexp_substr(input_string, '([^,]+)', 1, level, null, 1)
           from       inputs 
           connect by level <= 1 + regexp_count(input_string, ',')
         ),
         distinct_tokens (input_string, rk, token) as (
           select     input_string, min(rk) as rk, token
           from       tokens
           group by   input_string, token
         )
    select   input_string, listagg(token, ',') within group (order by rk) output_string
    from     distinct_tokens
    group by input_string
    ;
    

    Results for the inputs I created:

    INPUT_STRING                                                       OUTPUT_STRING
    ------------------------------------------------------------------ ----------------------------------------
    ,,,                                                                (null)
    POW,POW,POWPRO,PRO,PRO,PROUTL,TNEUTL,TNEUTL,UTL,UTLTNE,UTL,UTLTNE  POW,POWPRO,PRO,PROUTL,TNEUTL,UTL,UTLTNE
    ab,ab,st,ab,st,  , ,  ,x,,,r                                       ab,st,  , ,x,r
    (null)                                                             (null)
    
    
    4 rows selected.
    
    0 讨论(0)
提交回复
热议问题