问题
I am using the following function to split a string into rows. It is much faster than the previous function that I was using, however I need to somehow churn through this data quicker (its an ETL job):
ALTER FUNCTION [dbo].[ArrayToTable]
(
@InputString VARCHAR(MAX) = ''
, @Delimitter VARCHAR(1) = ','
)
RETURNS @RESULT TABLE([Position] INT IDENTITY, [Value] VARCHAR(MAX))
AS
BEGIN
DECLARE @XML XML
SELECT @XML = CONVERT(XML, SQL_TEXT)
FROM (
SELECT '<root><item>'
+ REPLACE(@InputString, @Delimitter, '</item><item>')
+ '</item></root>' AS SQL_TEXT
) dt
INSERT INTO @RESULT([Value])
SELECT t.col.query('.').value('.', 'VARCHAR(1000)') AS [Value]
FROM @XML.nodes('root/item') t(col)
RETURN
END
Can anyone think of a better/quicker way to turn a delimited string into rows? I am using a cross apply on my query to join to these results.
Can anyone think of a more efficient method to turn delimited strings into rows?
回答1:
Here's the most performant function I have:
CREATE FUNCTION [Resource].[udf_SplitByXml]
(@Data NVARCHAR(MAX), @Delimiter NVARCHAR(5))
RETURNS @Table TABLE
( Data NVARCHAR(MAX)
, SequentialOrder INT IDENTITY(1, 1))
AS
BEGIN
DECLARE @TextXml XML;
SELECT @TextXml = CAST('<d>' + REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@Data, '&', '&'), '<', '<'), '>', '>'), '"', '"'), '''', '''), @Delimiter, '</d><d>') + '</d>' AS XML);
INSERT INTO @Table (Data)
SELECT Data = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(RTRIM(LTRIM(T.split.value('.', 'nvarchar(max)'))), '&', '&'), '<', '<'), '>', '>'), '"', '"'), ''', '''')
FROM @TextXml.nodes('/d') T(Split)
RETURN
END
Here are sample calls you can use to test the results:
SELECT * FROM Resource.udf_SplitByXml('yes, no, maybe, so', ',');
SELECT * FROM Resource.udf_SplitByXml('who|what|where|when|why|how|Uh, I don''t know!', '|');
SELECT * FROM Resource.udf_SplitByXml('Government, Education, Non-profit|Energy & Power|Yes|No', '|');
SELECT * FROM Resource.udf_SplitByXml('Energy & Power|Some<Thing>Wicked''This"Way Comes', '|');
Another option is to try out the CLR solution based on Adam Machanic's code that was the winner of a performance test in this blog.
回答2:
Here's another function I just wrote that, to my surprise, is marginally faster than the XML method above, although just barely. Comparing both functions to process 1,000 rows of delimited strings yields no performance difference. Processing 50K rows of delimited strings, the XML method takes 129 seconds to generate 435,217 extracted rows, compared to 122 seconds to generate extracted 435,217 rows for my simple string manipulation method.
So it is not really substantially faster, although I guess it would make a difference if you're processing hundreds of thousands of rows. The main benefits of my function are that it is easy to read and understand, it is not dependent on XML functionality that may change in future versions of SQL Server, and it should be easily portable to any language. I was really hoping to find something really fast, but I guess this is the best we can do for now.
CREATE FUNCTION dbo.ufn_util_Split
(
@RawText varchar(max),
@SplitCharacter varchar(2)
)
RETURNS @t_Results TABLE
(
RowIndex int IDENTITY(1,1),
RowValue varchar(max)
)
AS
BEGIN
DECLARE @vc_RowValue varchar(max) = ''
DECLARE @vc_Remainder varchar(max) = CASE WHEN RIGHT(@RawText,1) = @SplitCharacter THEN @RawText ELSE @RawText + @SplitCharacter END --the string must end in the split character in order for this to work
DECLARE @int_SplitPosition int = 0
DECLARE @int_LenSplitChar int = 0
SELECT @int_LenSplitChar = LEN(@SplitCharacter)
--determine the first segment to start with
SELECT @int_SplitPosition = CHARINDEX(@SplitCharacter,@vc_Remainder)
WHILE (@int_SplitPosition > 0)
BEGIN
SELECT @vc_RowValue = LEFT(@vc_Remainder,@int_SplitPosition-1)
INSERT INTO @t_Results (RowValue) VALUES (@vc_RowValue)
--now strip off the segment we just extracted and determine where the next segment ends, and continue
SET @vc_Remainder = SUBSTRING(@vc_Remainder,@int_SplitPosition+@int_LenSplitChar,LEN(@vc_Remainder))
SELECT @int_SplitPosition = CHARINDEX(@SplitCharacter,@vc_Remainder)
CONTINUE
END
RETURN
END
回答3:
Hi, Try this -
create procedure sp_getAllItems
@input varchar(100)
as
BEGIN
create table #tmpFruits (name varchar(10))
Declare @Qry Varchar(500)
Set @Qry = ''
Select @Qry = @Qry + ' Insert into #tmpFruits '
Select @Qry = @Qry + Replace( 'Select ''' +
Replace(Replace(Replace(Replace(@input,CHAR(9),''),'
',''),CHAR(10),''),CHAR(13),'') , ',',''' Union Select ''') + ''''
Exec (@Qry)
select * from Fruitstest where name in (select name from #tmpFruits)
drop table #tmpFruits
END
exec sp_getAllItems @input = 'cherry,banana'
来源:https://stackoverflow.com/questions/23670772/most-efficient-way-to-split-string-into-rows