问题
My query is quite simple:
select a.ID, a.adres, a.place, a.postalcode
from COMPANIES a, COMPANIES b
where a.Postcode = b.Postcode
and a.Adres = b.Adres
and (
select COUNT(COMPANYID)
from USERS
where COMPANYID=a.ID
)>(
select COUNT(COMPANYID)
from USERS
where COMPANYID=b.ID
)
Database: sql server 2008 r2
What I'm trying to do: The table of COMPANIES contains double entries. I want to know the ones that are connected to the most amount of users. So I only have to change the foreign keys of those with the least. ( I already know the id's of the doubles)
Right now it's taking a lot of time to complete. I was wondering if if could be done faster
回答1:
Try this version. It should be only a little faster. The COUNT
is quite slow. I've added a.ID <> b.ID
to avoid few cases earlier.
select a.ID, a.adres, a.place, a.postalcode
from COMPANIES a INNER JOIN COMPANIES b
ON
a.ID <> b.ID
and a.Postcode = b.Postcode
and a.Adres = b.Adres
and (
select COUNT(COMPANYID)
from USERS
where COMPANYID=a.ID
)>(
select COUNT(COMPANYID)
from USERS
where COMPANYID=b.ID
)
The FROM ... INNER JOIN ... ON ...
is a preferred SQL construct to join tables. It may be faster too.
回答2:
One approach would be to pre-calculate the COMPANYID
count before doing the join since you'll be repeatedly calculating it in the main query. i.e. something like:
insert into @CompanyCount (ID, IDCount)
select COMPANYID, COUNT(COMPANYID)
from USERS
group by COMPANYID
Then your main query:
select a.ID, a.adres, a.place, a.postalcode
from COMPANIES a
inner join @CompanyCount aCount on aCount.ID = a.ID
inner join COMPANIES b on b.Postcode = a.Postcode and b.Adres = a.Adres
inner join @CompanyCount bCount on bCount.ID = b.ID and aCount.IDCount > bCount.IDCount
If you want all instances of a
even though there is no corresponding b
then you'd need to have left outer join
s to b
and bCount
.
However you need to look at the query plan - which indexes are you using - you probably want to have them on the ID
s and the Postcode
and Adres
fields as a minimum since you're joining on them.
回答3:
Build an index on postcode and adres
The database probably executes the subselects for every row. (Just guessing here, veryfy it in the explain plan. If this is the case you can rewrite the query to join with the inline views (note this is how it would look in oracle hop it works in sql server as well):
select distinct a.ID, a.adres, a.place, a.postalcode from COMPANIES a, COMPANIES b, ( select COUNT(COMPANYID) cnt, companyid from USERS group by companyid) cntA, ( select COUNT(COMPANYID) cnt, companyid from USERS group by companyid) cntb where a.Postcode = b.Postcode and a.Adres = b.Adres and a.ID<>b.ID and cnta.cnt>cntb.cnt
来源:https://stackoverflow.com/questions/10040908/comparison-query-taking-ages