问题
I have a function that takes two delimited strings and returns the number of common elements. The
The main code of the function is (@intCount is the expected return value)
SET @commonCount = (select count(*) from (
select token from dbo.splitString(@userKeywords, ';')
intersect
select token from dbo.splitString(@itemKeywords, ';')) as total)
where splitString uses a while loop and charIndex to split a string into delimited tokens and inserts it into a table.
The problem I am having is that this only processes at a speed of about 100 rows per second and by the size of my dataset, this will take about 8-10 days to finish.
The size of the two strings may be upto 1500 characters in length.
Is there anyway I can achieve this fast enough to be usable?
回答1:
The performance problem is probably the combination of a cursor (for the while loop) and the user defined function.
If one of these strings is constant (such as item key words), you can search for each one independently:
select *
from users u
where charindex(';'+<item1>+';', ';'+u.keywords) > 0
union all
select *
from users u
where charindex(';'+<item2>+';', ';'+u.keywords) > 0 union all
Alternatively, a set based approach can work, but you have to normalize the data (plug here for having data in the right format to begin with). That is, you want a table that has:
userid
keyword
And another that has
itemid
keyword
(if there are different types of items. Otherwise this is just a list of keywords.)
Then your query would look like:
select *
from userkeyword uk join
itemkeyword ik
on uk.keyword = ik.keyword
And the SQL engine would perform its magic.
Now, how can you create such a list? If you have only a handful of key words per user, then you can do something like:
with keyword1 as (select u.*, charindex(';', keywords) as pos1,
left(keywords, charindex(';', keywords)-1) as keyword1
from user u
where charindex(';', keywords) > 0
),
keyword2 as (select u.*, charindex(';', keywords, pos1+1) as pos2,
left(keywords, charindex(';', keywords)-1, pos1+1) as keyword2
from user u
where charindex(';', keywords, pos1+2) > 0
),
...
select userid, keyword1
from keyword1
union all
select userid, keyword2
from keyword2
...
To get the maximum number of elements in the itemKeyWords, you can use the following query:
select max(len(Keywords) - len(replace(Keywords, ';', '')))
from user
来源:https://stackoverflow.com/questions/10572858/optimise-sql-function-to-get-common-elements