问题
You guys are amazing. I've posted here twice in the past couple of days - a new user - and I've been blown away by the help. So, I figured I'd take the slowest query I've got in my software and see if anyone can help me speed it up. I use this query as a view, so it's important that it be fast (and it isn't!).
First, I have a Contacts Table that store my company's customers. In the table is a JobTitle column which contains an ID which is defined in the Contacts_Def_JobFunctions table. There is also a table called contacts_link_job_functions which holds the contactID number and additional job functions the customer has - also defined in the Contacts_Def_JobFunctions table.
Secondly, the Contacts_Def_JobFunctions table records have a parent/child relationship with themselves. In this manner, we cluster similar job functions (for example: maid, laundry service, housekeeping, cleaning, etc. are all the same basic job - while the job title may vary). Job functions which we don't currently work with are maintained as children of ParentJobID 1841.
Third, the institutionswithzipcodesadditional simply provides geographical data to the final result.
Lastly, like all responsible companies, we maintain a remove list for any of our customers that wish to opt-out of our newsletter (after opting in).
I use the following query to build a table of those people who have opted-in to receive our newsletter and who have a job function or job title relevant to the services/products we offer.
Here's my UGLY query:
SELECT DISTINCT
dbo.contacts_link_emails.Email, dbo.contacts.ContactID, dbo.contacts.First AS ContactFirstName, dbo.contacts.Last AS ContactLastName, dbo.contacts.InstitutionID,
dbo.institutionswithzipcodesadditional.CountyID, dbo.institutionswithzipcodesadditional.StateID, dbo.institutionswithzipcodesadditional.DistrictID
FROM
dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_3
INNER JOIN
dbo.contacts
INNER JOIN
dbo.contacts_link_emails
ON dbo.contacts.ContactID = dbo.contacts_link_emails.ContactID
ON contacts_def_jobfunctions_3.JobID = dbo.contacts.JobTitle
INNER JOIN
dbo.institutionswithzipcodesadditional
ON dbo.contacts.InstitutionID = dbo.institutionswithzipcodesadditional.InstitutionID
LEFT OUTER JOIN
dbo.contacts_def_jobfunctions
INNER JOIN
dbo.contacts_link_jobfunctions
ON dbo.contacts_def_jobfunctions.JobID = dbo.contacts_link_jobfunctions.JobID
ON dbo.contacts.ContactID = dbo.contacts_link_jobfunctions.ContactID
WHERE
(dbo.contacts.JobTitle IN
(SELECT JobID
FROM dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_1
WHERE (ParentJobID <> '1841')))
AND
(dbo.contacts_link_emails.Email NOT IN
(SELECT EmailAddress
FROM dbo.newsletterremovelist))
OR
(dbo.contacts_link_jobfunctions.JobID IN
(SELECT JobID
FROM dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_2
WHERE (ParentJobID <> '1841')))
AND
(dbo.contacts_link_emails.Email NOT IN
(SELECT EmailAddress
FROM dbo.newsletterremovelist AS newsletterremovelist))
I'm hoping some of you superstars can help me tune this up.
Thanks so much,
Russell Schutte
UPDATE - UPDATE - UPDATE - UPDATE - UPDATE
After getting several feedback messages, most notably from Khanzor, I've worked hard on tuning this query and have come up with the following:
SELECT DISTINCT
contacts_link_emails.Email, contacts.ContactID, contacts.First AS ContactFirstName, contacts.Last AS ContactLastName, contacts.InstitutionID,
institutionswithzipcodesadditional.CountyID, institutionswithzipcodesadditional.StateID, institutionswithzipcodesadditional.DistrictID
FROM contacts
INNER JOIN
contacts_def_jobfunctions ON contacts.jobtitle = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841'
INNER JOIN
contacts_link_jobfunctions ON contacts_link_jobfunctions.JobID = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841'
INNER JOIN
contacts_link_emails ON contacts.ContactID = contacts_link_emails.ContactID
INNER JOIN
institutionswithzipcodesadditional ON contacts.InstitutionID = institutionswithzipcodesadditional.InstitutionID
LEFT JOIN
newsletterremovelist ON newsletterremovelist.emailaddress = contacts_link_emails.email
WHERE
newsletterremovelist.emailaddress IS NULL
This isn't quite perfect (I suspect I should have made something an outer join or a right join or something, and I'm not really sure). My result set is about 40% of the records my original query provided (which I'm no longer 100% positive was a perfect query).
To clean things up, I took out all the "dbo." prefixes that SQL Studio adds. Do they do anything?
What am I doing wrong now?
Thanks,
Russell Schutte
== == == == == == ANOTHER UPDATE == ANOTHER UPDATE == ANOTHER UPDATE == ANOTHER UPDATE == ANOTHER UPDATE == == == == ==
I've been working on this one query for several hours now. I've got it down to this:
SELECT DISTINCT
contacts_link_emails.Email, contacts.contactID, contacts.First AS ContactFirstName, contacts.Last AS ContactLastName, contacts.InstitutionID,
institutionswithzipcodesadditional.CountyID, institutionswithzipcodesadditional.StateID, institutionswithzipcodesadditional.DistrictID
FROM
contacts INNER JOIN institutionswithzipcodesadditional
ON contacts.InstitutionID = institutionswithzipcodesadditional.InstitutionID
INNER JOIN contacts_link_emails
ON contacts.ContactID = contacts_link_emails.ContactID
LEFT OUTER JOIN contacts_def_jobfunctions
ON contacts.JobTitle = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841'
LEFT OUTER JOIN contacts_link_jobfunctions
ON contacts_link_jobfunctions.JobID = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841'
LEFT OUTER JOIN
newsletterremovelist ON newsletterremovelist.EmailAddress = contacts_link_emails.Email
WHERE (newsletterremovelist.EmailAddress IS NULL)
Disappointingly, I'm just not able to fill in the gaps in my knowledge. I'm new to joins, except when I have the visual tool build them for me, so I'm thinking I want everything from contacts, institutionswithzipcodesadditional, and contacts_link_emails, so I've INNER JOINed them (above).
I am stumped on the next bit. If I INNER JOIN them, then I get people who have the proper jobs (<> 1841) - but I'm thinking I LOSE out on people who don't have an entry for both JobTitle AND JobFunctions. In many cases, this isn't right. I could have a JobTitle "Custodian" which I'd want to keep on our newsletter list, but if he doesn't also have a JobFunction entry, I think he'll fall off the list if I use INNER JOIN.
BUT, if I do the query with LEFT OUTER JOINs, as above, I think I get lots of people with the wrong JobTitles, simply because anyone who is lacking EITHER a JobTitle OR a JobFunction would be ON my list - they could be a "High Level Executive" with no JobFunction, and they'd be on the list - which isn't right. We no longer have services appropriate to "High Level Executives".
Then I see how the LEFT OUTER JOIN works for the newsletterremovelist. It's pretty slick and I think I've done it right...
But I'm still stuck. Hopefully someone can see what I'm trying to do here and steer me in the right direction.
Thanks,
Russell Schutte
UPDATE AGAIN
Sadly, this thread seems to have died, without a perfect solution - but I'm getting close. Please see a new thread started which restarts the discussion: click here
(awarded a correct answer for the massive amount of work provided - even while a correct answer hasn't quite been reached).
Thanks!
Russell Schutte
回答1:
Move the queries in your WHERE
out to actual joins. These are called correlated subqueries, and are the work of the Voldemort. If they are joins, they are only executed once, and will speed up your query.
For the NOT IN
sections, use a left outer join, and check that the column you joined on is NULL
.
Also, avoid using OR
in WHERE
queries where possible - remember that OR
is not neccesarily a short circuit operation.
An example is as follows:
SELECT
*
FROM
dbo.contacts AS c
INNER JOIN
dbo.contacts_def_jobfunctions AS jf
ON c.JobTitle = jf.JobId AND jf.ParentJobID <> '1841'
INNER JOIN
dbo.contacts_link_emails AS e
ON c.ContactID = e.ContactID AND jf.JobID = c.JobTitle
LEFT JOIN
dbo.newsletterremovelist AS rl
ON e.Email = rl.EmailAddress
WHERE
rl.EmailAddress IS NULL
Please don't use this, as it's almost certainly incorrect (not to mention SELECT *
), I've ignored the logic for contacts_ref_jobfunctions_3 to provide a simple example.
For a (really) nice explanation of joins, try this visual explanation of joins
回答2:
Create some views representing some common associations that you make so that your sub-query is simpler. Also views execute a bit quicker as they do not need to be interpreted each time they are run.
回答3:
It could be any number of things. My first question is are the columns you're joining on indexed?
Better yet, do a SHOWPLAN
and paste it into your question.
来源:https://stackoverflow.com/questions/4466054/super-slow-query-what-have-i-done-wrong