Most efficient way to split string into rows

被刻印的时光 ゝ 提交于 2019-12-09 13:37:22

问题


I am using the following function to split a string into rows. It is much faster than the previous function that I was using, however I need to somehow churn through this data quicker (its an ETL job):

ALTER FUNCTION [dbo].[ArrayToTable]
(
     @InputString VARCHAR(MAX) = ''
     , @Delimitter VARCHAR(1) = ',' 
) 
RETURNS @RESULT TABLE([Position] INT IDENTITY, [Value] VARCHAR(MAX)) 
AS 
BEGIN 
    DECLARE @XML XML 
    SELECT @XML = CONVERT(XML, SQL_TEXT) 
    FROM ( 
        SELECT '<root><item>' 
            + REPLACE(@InputString, @Delimitter, '</item><item>') 
            + '</item></root>' AS SQL_TEXT 
        ) dt 

    INSERT INTO @RESULT([Value]) 
    SELECT t.col.query('.').value('.', 'VARCHAR(1000)') AS [Value] 
    FROM @XML.nodes('root/item') t(col) 
    RETURN 
END 

Can anyone think of a better/quicker way to turn a delimited string into rows? I am using a cross apply on my query to join to these results.

Can anyone think of a more efficient method to turn delimited strings into rows?


回答1:


Here's the most performant function I have:

CREATE FUNCTION [Resource].[udf_SplitByXml]
      (@Data NVARCHAR(MAX), @Delimiter NVARCHAR(5))
RETURNS @Table TABLE 
    ( Data NVARCHAR(MAX)
    , SequentialOrder INT IDENTITY(1, 1))
AS
BEGIN

    DECLARE @TextXml XML;
    SELECT @TextXml = CAST('<d>' + REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@Data, '&', '&amp;'), '<', '&lt;'), '>', '&gt;'), '"', '&quot;'), '''', '&apos;'), @Delimiter, '</d><d>') + '</d>' AS XML);

    INSERT INTO @Table (Data)
    SELECT Data = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(RTRIM(LTRIM(T.split.value('.', 'nvarchar(max)'))), '&amp;', '&'), '&lt;', '<'), '&gt;', '>'), '&quot;', '"'), '&apos;', '''')
    FROM @TextXml.nodes('/d') T(Split)

    RETURN
END

Here are sample calls you can use to test the results:

SELECT * FROM Resource.udf_SplitByXml('yes, no, maybe, so', ',');
SELECT * FROM Resource.udf_SplitByXml('who|what|where|when|why|how|Uh, I don''t know!', '|');
SELECT * FROM Resource.udf_SplitByXml('Government, Education, Non-profit|Energy & Power|Yes|No', '|');
SELECT * FROM Resource.udf_SplitByXml('Energy & Power|Some<Thing>Wicked''This"Way Comes', '|');

Another option is to try out the CLR solution based on Adam Machanic's code that was the winner of a performance test in this blog.




回答2:


Here's another function I just wrote that, to my surprise, is marginally faster than the XML method above, although just barely. Comparing both functions to process 1,000 rows of delimited strings yields no performance difference. Processing 50K rows of delimited strings, the XML method takes 129 seconds to generate 435,217 extracted rows, compared to 122 seconds to generate extracted 435,217 rows for my simple string manipulation method.

So it is not really substantially faster, although I guess it would make a difference if you're processing hundreds of thousands of rows. The main benefits of my function are that it is easy to read and understand, it is not dependent on XML functionality that may change in future versions of SQL Server, and it should be easily portable to any language. I was really hoping to find something really fast, but I guess this is the best we can do for now.

  CREATE FUNCTION dbo.ufn_util_Split
     (
     @RawText varchar(max),
     @SplitCharacter varchar(2)
     )
  RETURNS @t_Results TABLE
     (
     RowIndex int IDENTITY(1,1),
     RowValue varchar(max)
     )

  AS

  BEGIN
     DECLARE @vc_RowValue varchar(max) = ''
     DECLARE @vc_Remainder varchar(max) = CASE WHEN RIGHT(@RawText,1) = @SplitCharacter THEN @RawText ELSE @RawText + @SplitCharacter END   --the string must end in the split character in order for this to work
     DECLARE @int_SplitPosition int = 0
     DECLARE @int_LenSplitChar int = 0
     SELECT @int_LenSplitChar = LEN(@SplitCharacter)

     --determine the first segment to start with
     SELECT @int_SplitPosition = CHARINDEX(@SplitCharacter,@vc_Remainder)
     WHILE (@int_SplitPosition > 0)
     BEGIN
        SELECT @vc_RowValue = LEFT(@vc_Remainder,@int_SplitPosition-1)
        INSERT INTO @t_Results (RowValue) VALUES (@vc_RowValue)

        --now strip off the segment we just extracted and determine where the next segment ends, and continue
        SET @vc_Remainder = SUBSTRING(@vc_Remainder,@int_SplitPosition+@int_LenSplitChar,LEN(@vc_Remainder))
        SELECT @int_SplitPosition = CHARINDEX(@SplitCharacter,@vc_Remainder)
        CONTINUE
     END
     RETURN
  END



回答3:


Hi, Try this -

create procedure sp_getAllItems 
@input varchar(100)
as
BEGIN

    create table #tmpFruits (name varchar(10))

    Declare @Qry Varchar(500)      
    Set @Qry = ''        
    Select @Qry = @Qry + ' Insert into #tmpFruits '    
    Select @Qry = @Qry +  Replace( 'Select ''' + 
    Replace(Replace(Replace(Replace(@input,CHAR(9),''),' 
    ',''),CHAR(10),''),CHAR(13),'') , ',',''' Union Select ''') + ''''           

    Exec (@Qry) 

    select * from Fruitstest where name in (select name from #tmpFruits)
    drop table #tmpFruits 

END

exec sp_getAllItems @input = 'cherry,banana'


来源:https://stackoverflow.com/questions/23670772/most-efficient-way-to-split-string-into-rows

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!