SQL Server: how to populate sparse data with the rest of zero values?

问题

I have data reporting sales by every month and by every customer. When I count the values, the zero-values are not reported because of the sparsa data format.

Suppose customer 1-4. Suppose only customers 1-2 have recordings. Straight table has customerIDs on rows and months on the columns such that

|CustomerID|MonthID|Value|
-------------------------|
|     1    |201101 |  10 |
|     2    |201101 | 100 |

and then they are reported in Crosstab format such that

|CustomerID|201101|201102|2011103|...|201501|
---------------------------------------------
|    1     |  10  |   0  |   0   |...|  0   |  
|    2     |  100 |   0  |   0   |...|  0   |
|    3     |  0   |   0  |   0   |...|  0   |
|    4     |  0   |   0  |   0   |...|  0   |

when I count this I get nothing for the customers 3-4 because they have no recordings. I want to get the missing zero rows. How can I populate or select the original data and fill the non-existing zero values to the selection? Or more shortly:

What is the most elegant way to deal with the sparse data format and still have the zero customers on the final report?

回答1:

Prior to pivoting to your crosstab format, you would cross join tables Customers and Months, and then left join table Sales to that.

select 
    c.CustomerId
  , m.MonthId
  , Value = isnull(s.Value,0)
from customers c
  cross join months m
  left join sales s
    on s.CustomerId = c.CustomerId
   and s.MonthId = m.MonthId

rextester demo: http://rextester.com/XKU62242

returns:

+------------+---------+-------+
| CustomerId | MonthId | Value |
+------------+---------+-------+
|          1 |  201101 |    10 |
|          2 |  201101 |   100 |
|          3 |  201101 |     0 |
|          4 |  201101 |     0 |
|          1 |  201102 |     0 |
|          2 |  201102 |     0 |
|          3 |  201102 |     0 |
|          4 |  201102 |     0 |
|          1 |  201103 |     0 |
|          2 |  201103 |     0 |
|          3 |  201103 |     0 |
|          4 |  201103 |     0 |
+------------+---------+-------+

Adding a dynamic pivot() to the above could be done like so:

declare @cols nvarchar(max);
declare @sql  nvarchar(max);

select @cols = stuff((
    select ',' + quotename(MonthId)
    from months 
    order by MonthId
    for xml path (''), type).value('.','nvarchar(max)')
  ,1,1,'');

select @sql = '
select CustomerId, ' + @cols + '
from (
    select 
        c.CustomerId
      , m.MonthId
      , Value = isnull(s.Value,0)
    from customers c
      cross join months m
      left join sales s
        on s.CustomerId = c.CustomerId
      and s.MonthId = m.MonthId
    ) as t
pivot (sum([Value]) for [MonthId] in (' + @cols + ') ) p';

select @sql as CodeGenerated;
exec sp_executesql @sql;

returns:

+-----------------------------------------------------------------------+
|                             CodeGenerated                             |
+-----------------------------------------------------------------------+
| select CustomerId, [201101],[201102],[201103]                         |
| from (                                                                |
|     select                                                            |
|         c.CustomerId                                                  |
|       , m.MonthId                                                     |
|       , Value = isnull(s.Value,0)                                     |
|     from customers c                                                  |
|       cross join months m                                             |
|       left join sales s                                               |
|         on s.CustomerId = c.CustomerId                                |
|       and s.MonthId = m.MonthId                                       |
|     ) as t                                                            |
| pivot (sum([Value]) for [MonthId] in ([201101],[201102],[201103]) ) p |
+-----------------------------------------------------------------------+

and the exec returns:

+------------+--------+--------+--------+
| CustomerId | 201101 | 201102 | 201103 |
+------------+--------+--------+--------+
|          1 |     10 |      0 |      0 |
|          2 |    100 |      0 |      0 |
|          3 |      0 |      0 |      0 |
|          4 |      0 |      0 |      0 |
+------------+--------+--------+--------+

来源：https://stackoverflow.com/questions/43694372/sql-server-how-to-populate-sparse-data-with-the-rest-of-zero-values

标签

sql-server

sql-server-2014

sparse-matrix