MSSQL - Making multiple count distinct calls in a query runs slowly

问题

I have tables with the following schema:

Device

DeviceId
Name

Service

ServiceId
Name

Software

SoftwareId
Name

Device_Software

DeviceId
SoftwareId
DiscoveryDate

Device_Service

DeviceId
ServiceId
DiscoveryDate

Now, I'm trying to write a query that gives the a Device, and the number of distinct software and services that device has.

If I run the following query I get a result back within 5 seconds (device has 50,000 rows, software and service both have 200 and the link tables include a link for every device to every software and service. Just for testing purposes).

SELECT
  device.name
  ,COUNT(DISTINCT(device_software.softwareId))
FROM
  device
LEFT OUTER JOIN
  device_software ON device.deviceId = device_software.deviceId
GROUP BY device.name

But if I try to expand the query to include the counts for both, it takes much much longer (~30 minutes and still going):

SELECT
  device.name
  ,COUNT(DISTINCT(device_software.softwareId))
  ,COUNT(DISTINCT(device_service.serviceId))
FROM
  device
LEFT OUTER JOIN
  device_service ON device.deviceId = device_service.deviceId
LEFT OUTER JOIN
  device_software ON device.dDeviceId = device_software.deviceId
GROUP BY device.name

Now since this is in a stored procedure, I could just get the two counts individually and combine that, but that seems like a hack. I was wondering if anyone knows of a better way to go about doing this in a single query without having a massive performance hit?

回答1:

I'd try the following and see if it makes difference :

SELECT
device.name
a.cntSft, b.cntSrv
FROM device
LEFT JOIN
 ( SELECT deviceId, COUNT(DISTINCT softwareId) as cntSft FROM device_software 
 GROUP BY deviceId) a (ON a.deviceId = device.deviceId)
LEFT JOIN 
( SELECT deviceId, COUNT(DISTINCT serviceId) as cntSrv FROM device_service 
 GROUP BY deviceId) b (ON b.deviceId = device.deviceId);

You may also not need COUNT DISTINCT, but just COUNT with this version of query.

回答2:

You could consider indexed views on Device_Software and Device_Service:

CREATE VIEW dbo.v_Device_Software
WITH SCHEMABINDING
AS
  SELECT DeviceId, SoftwareId, DeviceCount = COUNT_BIG(*)
    FROM dbo.Device_Software
    GROUP BY DeviceId, SoftwareId;
GO
CREATE UNIQUE CLUSTERED INDEX x ON dbo.v_Device_Software(DeviceId, SoftwareId);
GO

CREATE VIEW dbo.v_Device_Service
WITH SCHEMABINDING
AS
  SELECT DeviceId, ServiceId, DeviceCount = COUNT_BIG(*)
    FROM dbo.Device_Service
    GROUP BY DeviceId, ServiceId;
GO
CREATE UNIQUE CLUSTERED INDEX x ON dbo.v_Device_Service(DeviceId, ServiceId);
GO

Now your query becomes:

SELECT
  device.name
  ,COUNT(vsoft.DeviceId)
  ,COUNT(vserv.DeviceId)
FROM
  dbo.device
LEFT OUTER JOIN dbo.v_Device_Service AS vserv
  ON device.deviceId = vserv.DeviceId
LEFT OUTER JOIN dbo.v_Device_Software AS vsoft
  ON device.deviceId = voft.DeviceId
GROUP BY device.name;

There are many restrictions, though, and you should be sure to test the impact this has on your entire workload, not just this one query.

来源：https://stackoverflow.com/questions/12182704/mssql-making-multiple-count-distinct-calls-in-a-query-runs-slowly

标签

sql-server

performance

distinct