What's the most efficient way to normalize text from column into a table?

前端 未结 2 1511
囚心锁ツ
囚心锁ツ 2020-12-10 22:14

In T-SQL I have a column with some text in it with a format like the following:

[Key1:Value1:Value2:Value3:Value4:Value5]
[Key2:Value1:Value2:Value3:Value4:V         


        
2条回答
  •  没有蜡笔的小新
    2020-12-10 23:04

    The fastest way to split a string when you know the maximum number of columns is to use the Cascading CROSS APPLY technique. Let's say you know that their will be no more than 10 items in your string. You could do this:

    DECLARE @string varchar(1000) = '[Key1:Value1:Value2:Value3:Value4:Value5]'
    
    SELECT 
      [key] = SUBSTRING(t.string,1,d1.d-1),
      col1  = SUBSTRING(t.string,d1.d+1,d2.d-d1.d-1),
      col2  = SUBSTRING(t.string,d2.d+1,d3.d-d2.d-1),
      col3  = SUBSTRING(t.string,d3.d+1,d4.d-d3.d-1),
      col4  = SUBSTRING(t.string,d4.d+1,d5.d-d4.d-1),
      col5  = SUBSTRING(t.string,d5.d+1,d6.d-d5.d-1),
      col6  = SUBSTRING(t.string,d6.d+1,d7.d-d5.d-1),
      col7  = SUBSTRING(t.string,d7.d+1,d8.d-d5.d-1),
      col8  = SUBSTRING(t.string,d8.d+1,d9.d-d5.d-1),
      col9  = SUBSTRING(t.string,d9.d+1,d10.d-d5.d-1)
    FROM (VALUES (REPLACE(REPLACE(@string,']',':'),'[',''))) t(string)
    CROSS APPLY (VALUES (CHARINDEX(':',t.string)))                   d1(d)
    CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d1.d+1),0)))  d2(d)
    CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d2.d+1),0)))  d3(d)
    CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d3.d+1),0)))  d4(d)
    CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d4.d+1),0)))  d5(d)
    CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d5.d+1),0)))  d6(d)
    CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d6.d+1),0)))  d7(d)
    CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d7.d+1),0)))  d8(d)
    CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d8.d+1),0)))  d9(d)
    CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d9.d+1),0)))  d10(d);
    

    To use this technique against a table with the strings stored in rows would be like this:

    DECLARE @table TABLE (someid int identity, somestring varchar(1000));
    INSERT @table(somestring) VALUES 
    ('[Key1:Value1:Value2:Value3:Value4:Value5]'),
    ('[Key2:Value1:Value2:Value3:Value4:Value5]'),
    ('[Key3:Value1:Value2:Value3:Value4:Value5]'),
    ('[Key4:Value1:Value2:Value3:Value4:Value5:Value6:Value7:Value8]'),
    ('[Key5:Value1:Value2:Value3:Value4:Value5:Value6:Value7:Value8:Value9:Value10]');
    
    SELECT * 
    FROM @table s
    CROSS APPLY
    (
      SELECT 
        [key]  = SUBSTRING(t.string,1,d1.d-1),
        dCount = LEN(t.string)-LEN(REPLACE(t.string,':','')),
        col1   = SUBSTRING(t.string,d1.d+1,d2.d-d1.d-1),
        col2   = SUBSTRING(t.string,d2.d+1,d3.d-d2.d-1),
        col3   = SUBSTRING(t.string,d3.d+1,d4.d-d3.d-1),
        col4   = SUBSTRING(t.string,d4.d+1,d5.d-d4.d-1),
        col5   = SUBSTRING(t.string,d5.d+1,d6.d-d5.d-1),
        col6   = SUBSTRING(t.string,d6.d+1,d7.d-d6.d-1),
        col7   = SUBSTRING(t.string,d7.d+1,d8.d-d7.d-1),
        col8   = SUBSTRING(t.string,d8.d+1,d9.d-d8.d-1),
        col9   = SUBSTRING(t.string,d9.d+1,d10.d-d9.d-1)
      FROM (VALUES (REPLACE(REPLACE(s.somestring,']',':'),'[',''))) t(string)
      CROSS APPLY (VALUES (CHARINDEX(':',t.string)))                   d1(d)
      CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d1.d+1),0)))  d2(d)
      CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d2.d+1),0)))  d3(d)
      CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d3.d+1),0)))  d4(d)
      CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d4.d+1),0)))  d5(d)
      CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d5.d+1),0)))  d6(d)
      CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d6.d+1),0)))  d7(d)
      CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d7.d+1),0)))  d8(d)
      CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d8.d+1),0)))  d9(d)
      CROSS APPLY (VALUES (NULLIF(CHARINDEX(':',t.string,d9.d+1),0)))  d10(d)
    ) split
    WHERE LEN(s.somestring)-LEN(REPLACE(s.somestring,':','')) < 10
    

    If you don't know the maximum number of possible items you could take this logic and wrap it in some Dynamic SQL that creates the correct number of CROSS APPLY's. I don't have time to put together that logic but, to get the maximum number of possible delimiters you could do something like this:

    DECLARE @maxDelimiters tinyint = 
      (SELECT MAX(LEN(s.somestring)-LEN(REPLACE(s.somestring,':',''))) FROM @table s);
    

    Alternatively, if you wanted to use John's technique, you could also use Dynamic SQL to create his query with the exact number of "pos" values required.

提交回复
热议问题