SQL: count number of distinct values in every column

问题

I need a query that will return a table where each column is the count of distinct values in the columns of another table.

I know how to count the distinct values in one column:

select count(distinct columnA) from table1;

I suppose that I could just make this a really long select clause:

select count(distinct columnA), count(distinct columnB), ... from table1;

but that isn't very elegant and it's hardcoded. I'd prefer something more flexible.

回答1:

try this (sql server 2005 syntax):

DECLARE @YourTable table (col1  varchar(5)
                         ,col2  int
                         ,col3  datetime
                         ,col4  char(3)
                         )

insert into @YourTable values ('abcdf',123,'1/1/2009','aaa')
insert into @YourTable values ('aaaaa',456,'1/2/2009','bbb')
insert into @YourTable values ('bbbbb',789,'1/3/2009','aaa')
insert into @YourTable values ('ccccc',789,'1/4/2009','bbb')
insert into @YourTable values ('aaaaa',789,'1/5/2009','aaa')
insert into @YourTable values ('abcdf',789,'1/6/2009','aaa')


;with RankedYourTable AS
(
SELECT
    ROW_NUMBER() OVER(PARTITION by col1 order by col1) AS col1Rank
        ,ROW_NUMBER() OVER(PARTITION by col2 order by col2) AS col2Rank
        ,ROW_NUMBER() OVER(PARTITION by col3 order by col3) AS col3Rank
        ,ROW_NUMBER() OVER(PARTITION by col4 order by col4) AS col4Rank
    FROM @YourTable
)
SELECT
    SUM(CASE WHEN      col1Rank=1 THEN 1 ELSE 0 END) AS col1DistinctCount
        ,SUM(CASE WHEN col2Rank=1 THEN 1 ELSE 0 END) AS col2DistinctCount
        ,SUM(CASE WHEN col3Rank=1 THEN 1 ELSE 0 END) AS col3DistinctCount
        ,SUM(CASE WHEN col4Rank=1 THEN 1 ELSE 0 END) AS col4DistinctCount
    FROM RankedYourTable

OUTPUT:

col1DistinctCount col2DistinctCount col3DistinctCount col4DistinctCount
----------------- ----------------- ----------------- -----------------
4                 3                 6                 2

(1 row(s) affected)

回答2:

This code should give you all the columns in 'table1' with the respective distinct value count for each one as data.

DECLARE @TableName VarChar (Max) = 'table1'
DECLARE @SqlString VarChar (Max)

set @SqlString = (
  SELECT DISTINCT
    'SELECT ' + 
        RIGHT (ColumnList, LEN (ColumnList)-1) + 
      ' FROM ' + Table_Name
    FROM INFORMATION_SCHEMA.COLUMNS COL1
      CROSS AppLy (
        SELECT ', COUNT (DISTINCT [' + COLUMN_NAME + ']) AS ' + '''' + COLUMN_NAME + ''''
          FROM INFORMATION_SCHEMA.COLUMNS COL2
          WHERE COL1.TABLE_NAME = COL2.TABLE_NAME
          FOR XML PATH ('')
      ) TableColumns (ColumnList)
    WHERE
      1=1 AND 
      COL1.TABLE_NAME = @TableName
)

EXECUTE (@SqlString)

回答3:

and it's hardcoded.

It is not hardcoding to provide a field list for a sql statement. It's common and acceptable practice.

回答4:

This won't necessarily be possible for every field in a table. For example, you can't do a DISTINCT against a SQL Server ntext or image field unless you cast them to other data types and lose some precision.

回答5:

I appreciate all of the responses. I think the solution that will work best for me in this situation (counting the number of distinct values in each column of a table from an external program that has no knowledge of the table except its name) is as follows:

Run "describe table1" and pull out the column names from the result.

Loop through the column names and create the query to count the distinct values in each column. The query will look something like "select count(distinct columnA), count(distinct columnB), ... from table1".

回答6:

DISTINCT is evil. Do COUNT/GROUP BY

来源：https://stackoverflow.com/questions/1363576/sql-count-number-of-distinct-values-in-every-column

标签

sql

mysql

count

distinct