问题
I have tables with the following schema:
Device
- DeviceId
- Name
Service
- ServiceId
- Name
Software
- SoftwareId
- Name
Device_Software
- DeviceId
- SoftwareId
- DiscoveryDate
Device_Service
- DeviceId
- ServiceId
- DiscoveryDate
Now, I'm trying to write a query that gives the a Device, and the number of distinct software and services that device has.
If I run the following query I get a result back within 5 seconds (device has 50,000 rows, software and service both have 200 and the link tables include a link for every device to every software and service. Just for testing purposes).
SELECT
device.name
,COUNT(DISTINCT(device_software.softwareId))
FROM
device
LEFT OUTER JOIN
device_software ON device.deviceId = device_software.deviceId
GROUP BY device.name
But if I try to expand the query to include the counts for both, it takes much much longer (~30 minutes and still going):
SELECT
device.name
,COUNT(DISTINCT(device_software.softwareId))
,COUNT(DISTINCT(device_service.serviceId))
FROM
device
LEFT OUTER JOIN
device_service ON device.deviceId = device_service.deviceId
LEFT OUTER JOIN
device_software ON device.dDeviceId = device_software.deviceId
GROUP BY device.name
Now since this is in a stored procedure, I could just get the two counts individually and combine that, but that seems like a hack. I was wondering if anyone knows of a better way to go about doing this in a single query without having a massive performance hit?
回答1:
I'd try the following and see if it makes difference :
SELECT
device.name
a.cntSft, b.cntSrv
FROM device
LEFT JOIN
( SELECT deviceId, COUNT(DISTINCT softwareId) as cntSft FROM device_software
GROUP BY deviceId) a (ON a.deviceId = device.deviceId)
LEFT JOIN
( SELECT deviceId, COUNT(DISTINCT serviceId) as cntSrv FROM device_service
GROUP BY deviceId) b (ON b.deviceId = device.deviceId);
You may also not need COUNT DISTINCT
, but just COUNT
with this version of query.
回答2:
You could consider indexed views on Device_Software and Device_Service:
CREATE VIEW dbo.v_Device_Software
WITH SCHEMABINDING
AS
SELECT DeviceId, SoftwareId, DeviceCount = COUNT_BIG(*)
FROM dbo.Device_Software
GROUP BY DeviceId, SoftwareId;
GO
CREATE UNIQUE CLUSTERED INDEX x ON dbo.v_Device_Software(DeviceId, SoftwareId);
GO
CREATE VIEW dbo.v_Device_Service
WITH SCHEMABINDING
AS
SELECT DeviceId, ServiceId, DeviceCount = COUNT_BIG(*)
FROM dbo.Device_Service
GROUP BY DeviceId, ServiceId;
GO
CREATE UNIQUE CLUSTERED INDEX x ON dbo.v_Device_Service(DeviceId, ServiceId);
GO
Now your query becomes:
SELECT
device.name
,COUNT(vsoft.DeviceId)
,COUNT(vserv.DeviceId)
FROM
dbo.device
LEFT OUTER JOIN dbo.v_Device_Service AS vserv
ON device.deviceId = vserv.DeviceId
LEFT OUTER JOIN dbo.v_Device_Software AS vsoft
ON device.deviceId = voft.DeviceId
GROUP BY device.name;
There are many restrictions, though, and you should be sure to test the impact this has on your entire workload, not just this one query.
来源:https://stackoverflow.com/questions/12182704/mssql-making-multiple-count-distinct-calls-in-a-query-runs-slowly