optimise sql function to get common elements

吃可爱长大的小学妹 提交于 2019-12-11 13:29:24

问题


I have a function that takes two delimited strings and returns the number of common elements. The

The main code of the function is (@intCount is the expected return value)

    SET @commonCount = (select count(*) from (
    select token from dbo.splitString(@userKeywords, ';')
    intersect
    select token from dbo.splitString(@itemKeywords, ';')) as total)

where splitString uses a while loop and charIndex to split a string into delimited tokens and inserts it into a table.

The problem I am having is that this only processes at a speed of about 100 rows per second and by the size of my dataset, this will take about 8-10 days to finish.

The size of the two strings may be upto 1500 characters in length.

Is there anyway I can achieve this fast enough to be usable?


回答1:


The performance problem is probably the combination of a cursor (for the while loop) and the user defined function.

If one of these strings is constant (such as item key words), you can search for each one independently:

select *
from users u
where charindex(';'+<item1>+';', ';'+u.keywords) > 0
union all
select *
from users u
where charindex(';'+<item2>+';', ';'+u.keywords) > 0 union all

Alternatively, a set based approach can work, but you have to normalize the data (plug here for having data in the right format to begin with). That is, you want a table that has:

userid
keyword

And another that has

itemid
keyword

(if there are different types of items. Otherwise this is just a list of keywords.)

Then your query would look like:

select *
from userkeyword uk join
     itemkeyword ik
     on uk.keyword = ik.keyword

And the SQL engine would perform its magic.

Now, how can you create such a list? If you have only a handful of key words per user, then you can do something like:

with keyword1 as (select u.*, charindex(';', keywords) as pos1,
                         left(keywords, charindex(';', keywords)-1) as keyword1
                  from user u
                  where charindex(';', keywords) > 0
                 ),
     keyword2 as (select u.*, charindex(';', keywords, pos1+1) as pos2,
                         left(keywords, charindex(';', keywords)-1, pos1+1) as keyword2
                  from user u
                  where charindex(';', keywords, pos1+2) > 0
                 ),
        ...
select userid, keyword1
from keyword1
union all
select userid, keyword2
from keyword2
...

To get the maximum number of elements in the itemKeyWords, you can use the following query:

select max(len(Keywords) - len(replace(Keywords, ';', '')))
from user


来源:https://stackoverflow.com/questions/10572858/optimise-sql-function-to-get-common-elements

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!