Super Slow Query… What have I done wrong?

老子叫甜甜 提交于 2019-12-10 21:33:12

问题


You guys are amazing. I've posted here twice in the past couple of days - a new user - and I've been blown away by the help. So, I figured I'd take the slowest query I've got in my software and see if anyone can help me speed it up. I use this query as a view, so it's important that it be fast (and it isn't!).

First, I have a Contacts Table that store my company's customers. In the table is a JobTitle column which contains an ID which is defined in the Contacts_Def_JobFunctions table. There is also a table called contacts_link_job_functions which holds the contactID number and additional job functions the customer has - also defined in the Contacts_Def_JobFunctions table.

Secondly, the Contacts_Def_JobFunctions table records have a parent/child relationship with themselves. In this manner, we cluster similar job functions (for example: maid, laundry service, housekeeping, cleaning, etc. are all the same basic job - while the job title may vary). Job functions which we don't currently work with are maintained as children of ParentJobID 1841.

Third, the institutionswithzipcodesadditional simply provides geographical data to the final result.

Lastly, like all responsible companies, we maintain a remove list for any of our customers that wish to opt-out of our newsletter (after opting in).

I use the following query to build a table of those people who have opted-in to receive our newsletter and who have a job function or job title relevant to the services/products we offer.

Here's my UGLY query:

SELECT DISTINCT 
    dbo.contacts_link_emails.Email, dbo.contacts.ContactID, dbo.contacts.First AS ContactFirstName, dbo.contacts.Last AS ContactLastName, dbo.contacts.InstitutionID, 
    dbo.institutionswithzipcodesadditional.CountyID, dbo.institutionswithzipcodesadditional.StateID, dbo.institutionswithzipcodesadditional.DistrictID
FROM         
    dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_3 
INNER JOIN
    dbo.contacts 
INNER JOIN
    dbo.contacts_link_emails 
        ON dbo.contacts.ContactID = dbo.contacts_link_emails.ContactID 
        ON contacts_def_jobfunctions_3.JobID = dbo.contacts.JobTitle 
INNER JOIN
    dbo.institutionswithzipcodesadditional 
        ON dbo.contacts.InstitutionID = dbo.institutionswithzipcodesadditional.InstitutionID 
LEFT OUTER JOIN
    dbo.contacts_def_jobfunctions 
INNER JOIN
    dbo.contacts_link_jobfunctions 
        ON dbo.contacts_def_jobfunctions.JobID = dbo.contacts_link_jobfunctions.JobID 
        ON dbo.contacts.ContactID = dbo.contacts_link_jobfunctions.ContactID
WHERE     
        (dbo.contacts.JobTitle IN
        (SELECT     JobID
        FROM          dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_1
        WHERE      (ParentJobID <> '1841'))) 
    AND
        (dbo.contacts_link_emails.Email NOT IN
        (SELECT     EmailAddress
        FROM          dbo.newsletterremovelist)) 
OR
        (dbo.contacts_link_jobfunctions.JobID IN
        (SELECT     JobID
        FROM          dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_2
        WHERE      (ParentJobID <> '1841')))
    AND 
        (dbo.contacts_link_emails.Email NOT IN
        (SELECT     EmailAddress
        FROM          dbo.newsletterremovelist AS newsletterremovelist)) 

I'm hoping some of you superstars can help me tune this up.

Thanks so much,

Russell Schutte

UPDATE - UPDATE - UPDATE - UPDATE - UPDATE

After getting several feedback messages, most notably from Khanzor, I've worked hard on tuning this query and have come up with the following:

SELECT  DISTINCT
                  contacts_link_emails.Email, contacts.ContactID, contacts.First AS ContactFirstName, contacts.Last AS ContactLastName, contacts.InstitutionID, 
                  institutionswithzipcodesadditional.CountyID, institutionswithzipcodesadditional.StateID, institutionswithzipcodesadditional.DistrictID
FROM contacts 
INNER JOIN
    contacts_def_jobfunctions ON contacts.jobtitle = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841'
INNER JOIN
    contacts_link_jobfunctions ON contacts_link_jobfunctions.JobID = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841'
INNER JOIN
    contacts_link_emails ON contacts.ContactID = contacts_link_emails.ContactID 
INNER JOIN
    institutionswithzipcodesadditional ON contacts.InstitutionID =  institutionswithzipcodesadditional.InstitutionID
LEFT JOIN
    newsletterremovelist ON newsletterremovelist.emailaddress = contacts_link_emails.email
WHERE    
    newsletterremovelist.emailaddress IS NULL

This isn't quite perfect (I suspect I should have made something an outer join or a right join or something, and I'm not really sure). My result set is about 40% of the records my original query provided (which I'm no longer 100% positive was a perfect query).

To clean things up, I took out all the "dbo." prefixes that SQL Studio adds. Do they do anything?

What am I doing wrong now?

Thanks,

Russell Schutte

== == == == == == ANOTHER UPDATE == ANOTHER UPDATE == ANOTHER UPDATE == ANOTHER UPDATE == ANOTHER UPDATE == == == == ==

I've been working on this one query for several hours now. I've got it down to this:

SELECT DISTINCT 
                      contacts_link_emails.Email, contacts.contactID,  contacts.First AS ContactFirstName, contacts.Last AS ContactLastName, contacts.InstitutionID, 
                      institutionswithzipcodesadditional.CountyID, institutionswithzipcodesadditional.StateID, institutionswithzipcodesadditional.DistrictID
FROM         
    contacts INNER JOIN institutionswithzipcodesadditional
        ON contacts.InstitutionID = institutionswithzipcodesadditional.InstitutionID
    INNER JOIN contacts_link_emails 
        ON contacts.ContactID = contacts_link_emails.ContactID
    LEFT OUTER JOIN contacts_def_jobfunctions 
        ON contacts.JobTitle = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841'
    LEFT OUTER JOIN contacts_link_jobfunctions
        ON contacts_link_jobfunctions.JobID = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841' 
    LEFT OUTER JOIN
        newsletterremovelist ON newsletterremovelist.EmailAddress = contacts_link_emails.Email
WHERE     (newsletterremovelist.EmailAddress IS NULL)

Disappointingly, I'm just not able to fill in the gaps in my knowledge. I'm new to joins, except when I have the visual tool build them for me, so I'm thinking I want everything from contacts, institutionswithzipcodesadditional, and contacts_link_emails, so I've INNER JOINed them (above).

I am stumped on the next bit. If I INNER JOIN them, then I get people who have the proper jobs (<> 1841) - but I'm thinking I LOSE out on people who don't have an entry for both JobTitle AND JobFunctions. In many cases, this isn't right. I could have a JobTitle "Custodian" which I'd want to keep on our newsletter list, but if he doesn't also have a JobFunction entry, I think he'll fall off the list if I use INNER JOIN.

BUT, if I do the query with LEFT OUTER JOINs, as above, I think I get lots of people with the wrong JobTitles, simply because anyone who is lacking EITHER a JobTitle OR a JobFunction would be ON my list - they could be a "High Level Executive" with no JobFunction, and they'd be on the list - which isn't right. We no longer have services appropriate to "High Level Executives".

Then I see how the LEFT OUTER JOIN works for the newsletterremovelist. It's pretty slick and I think I've done it right...

But I'm still stuck. Hopefully someone can see what I'm trying to do here and steer me in the right direction.

Thanks,

Russell Schutte

UPDATE AGAIN

Sadly, this thread seems to have died, without a perfect solution - but I'm getting close. Please see a new thread started which restarts the discussion: click here

(awarded a correct answer for the massive amount of work provided - even while a correct answer hasn't quite been reached).

Thanks!

Russell Schutte


回答1:


Move the queries in your WHERE out to actual joins. These are called correlated subqueries, and are the work of the Voldemort. If they are joins, they are only executed once, and will speed up your query.

For the NOT IN sections, use a left outer join, and check that the column you joined on is NULL.

Also, avoid using OR in WHERE queries where possible - remember that OR is not neccesarily a short circuit operation.

An example is as follows:

SELECT 
    *
FROM
    dbo.contacts AS c
INNER JOIN
    dbo.contacts_def_jobfunctions AS jf
    ON c.JobTitle = jf.JobId AND jf.ParentJobID <> '1841'
INNER JOIN
    dbo.contacts_link_emails AS e
    ON c.ContactID = e.ContactID AND jf.JobID = c.JobTitle 
LEFT JOIN
    dbo.newsletterremovelist AS rl
    ON e.Email = rl.EmailAddress
WHERE    
    rl.EmailAddress IS NULL

Please don't use this, as it's almost certainly incorrect (not to mention SELECT *), I've ignored the logic for contacts_ref_jobfunctions_3 to provide a simple example.

For a (really) nice explanation of joins, try this visual explanation of joins




回答2:


Create some views representing some common associations that you make so that your sub-query is simpler. Also views execute a bit quicker as they do not need to be interpreted each time they are run.




回答3:


It could be any number of things. My first question is are the columns you're joining on indexed?

Better yet, do a SHOWPLAN and paste it into your question.



来源:https://stackoverflow.com/questions/4466054/super-slow-query-what-have-i-done-wrong

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!