Split string by space and character as delimiter in Oracle with regexp_substr

前端 未结 5 2014
太阳男子
太阳男子 2020-12-04 00:32

I\'m trying to split a string with regexp_subtr, but i can\'t make it work.

So, first, i have this query

select regexp_substr(\'Helloworld - test!\'          


        
相关标签:
5条回答
  • 2020-12-04 00:44

    Slight improvement on MT0's answer. Dynamic count using regexp_count and proves it handles nulls where the format of [^delimiter]+ as a pattern does NOT handle NULL list elements. More info on that here: Split comma seperated values to columns

    SQL> with tbl(str) as (
      2    select ' - Hello world - test-test! -  - test - ' from dual
      3  )
      4  SELECT LEVEL AS Occurrence,
      5         REGEXP_SUBSTR( str ,'(.*?)([[:space:]]-[[:space:]]|$)', 1, LEVEL, NULL, 1 ) AS split_value
      6  FROM   tbl
      7  CONNECT BY LEVEL <= regexp_count(str, '[[:space:]]-[[:space:]]')+1;
    
    OCCURRENCE SPLIT_VALUE
    ---------- ----------------------------------------
             1
             2 Hello world
             3 test-test!
             4
             5 test
             6
    
    6 rows selected.
    
    SQL>
    
    0 讨论(0)
  • 2020-12-04 00:50

    SQL Fiddle

    Oracle 11g R2 Schema Setup:

    CREATE TABLE TEST( str ) AS
              SELECT 'Hello world - test-test! - test' FROM DUAL
    UNION ALL SELECT 'Hello world2 - test2 - test-test2' FROM DUAL;
    

    Query 1:

    SELECT Str,
           COLUMN_VALUE AS Occurrence,
           REGEXP_SUBSTR( str ,'(.*?)([[:space:]]-[[:space:]]|$)', 1, COLUMN_VALUE, NULL, 1 ) AS split_value
    FROM   TEST,
           TABLE(
             CAST(
               MULTISET(
                 SELECT LEVEL
                 FROM   DUAL
                 CONNECT BY LEVEL < REGEXP_COUNT( str ,'(.*?)([[:space:]]-[[:space:]]|$)' )
               )
               AS SYS.ODCINUMBERLIST
             )
           )
    

    Results:

    |                               STR | OCCURRENCE |  SPLIT_VALUE |
    |-----------------------------------|------------|--------------|
    |   Hello world - test-test! - test |          1 |  Hello world |
    |   Hello world - test-test! - test |          2 |   test-test! |
    |   Hello world - test-test! - test |          3 |         test |
    | Hello world2 - test2 - test-test2 |          1 | Hello world2 |
    | Hello world2 - test2 - test-test2 |          2 |        test2 |
    | Hello world2 - test2 - test-test2 |          3 |   test-test2 |
    
    0 讨论(0)
  • 2020-12-04 01:00

    If i understood correctly, this will help you. Currently you are getting output as Helloworld(with space at the end). So i assume u don't want to have space at the end. If so you can simply use the space in the delimiter also like.

    select regexp_substr('Helloworld - test!' ,'[^ - ]+',1,1)from dual;
    
    OUTPUT
    Helloworld(No space at the end)
    

    As u mentioned in ur comment if u want two columns output with Helloworld and test!. you can do the following.

    select regexp_substr('Helloworld - test!' ,'[^ - ]+',1,1),
           regexp_substr('Helloworld - test!' ,'[^ - ]+',1,3) from dual;
    
    OUTPUT
    col1         col2
    Helloworld   test!
    
    0 讨论(0)
  • 2020-12-04 01:01
    CREATE OR REPLACE FUNCTION field(i_string            VARCHAR2
                                    ,i_delimiter         VARCHAR2
                                    ,i_occurance         NUMBER
                                    ,i_return_number     NUMBER DEFAULT 0
                                    ,i_replace_delimiter VARCHAR2) RETURN VARCHAR2     IS
      -----------------------------------------------------------------------
      -- Function Name.......: FIELD
      -- Author..............: Dan Simson
      -- Date................: 05/06/2016 
      -- Description.........: This function is similar to the one I used from 
      --                       long ago by Prime Computer.  You can easily
      --                       parse a delimited string.
      -- Example.............: 
      --  String.............: This is a cool function
      --  Delimiter..........: ' '
      --  Occurance..........: 2
      --  Return Number......: 3
      --  Replace Delimiter..: '/'
      --  Return Value.......: is/a/cool
      --------------------------------------------------------------------------    ---                                    
      v_return_string  VARCHAR2(32767);
      n_start          NUMBER := i_occurance;
      v_delimiter      VARCHAR2(1);
      n_return_number  NUMBER := i_return_number;
      n_max_delimiters NUMBER := regexp_count(i_string, i_delimiter);
    BEGIN
      IF i_return_number > n_max_delimiters THEN
        n_return_number := n_max_delimiters + 1;
      END IF;
      FOR a IN 1 .. n_return_number LOOP
        v_return_string := v_return_string || v_delimiter || regexp_substr    (i_string, '[^' || i_delimiter || ']+', 1, n_start);
        n_start         := n_start + 1;
        v_delimiter     := nvl(i_replace_delimiter, i_delimiter);
      END LOOP;
      RETURN(v_return_string);
    END field;
    
    
    SELECT field('This is a cool function',' ',2,3,'/') FROM dual;
    
    SELECT regexp_substr('This is a cool function', '[^ ]+', 1, 1) Word1
          ,regexp_substr('This is a cool function', '[^ ]+', 1, 2) Word2
          ,regexp_substr('This is a cool function', '[^ ]+', 1, 3) Word3
          ,regexp_substr('This is a cool function', '[^ ]+', 1, 4) Word4
          ,regexp_substr('This is a cool function', '[^ ]+', 1, 5) Word5
      FROM dual;
    
    0 讨论(0)
  • 2020-12-04 01:09

    Trying to negate the match string '[[:space:]]-[[:space:]]' by putting it in a character class with a circumflex (^) to negate it will not work. Everything between a pair of square brackets is treated as a list of optional single characters except for named named character classes which expand out to a list of optional characters, however, due to the way character classes nest, it's very likely that your outer brackets are being interpreted as follows:

    • [^[[:space:]] A single non space non left square bracket character
    • - followed by a single hyphen
    • [[:space:]] followed by a single space character
    • ]+ followed by 1 or more closing square brackets.

    It may be easier to convert your multi-character separator to a single character with regexp_replace, then use regex_substr to find you individual pieces:

    select regexp_substr(regexp_replace('Helloworld - test!'
                                       ,'[[:space:]]-[[:space:]]'
                                       ,chr(11))
                        ,'([^'||chr(11)||']*)('||chr(11)||'|$)'
                        ,1 -- Start here
                        ,2 -- return 1st, 2nd, 3rd, etc. match
                        ,null
                        ,1 -- return 1st sub exp
                        )
      from dual;
    

    In this code I first changed - to chr(11). That's the ASCII vertical tab (VT) character which is unlikely to appear in most text strings. Then the match expression of the regexp_substr matches all non VT characters followed by either a VT character or the end of line. Only the non VT characters are returned (the first subexpression).

    0 讨论(0)
提交回复
热议问题