How to perform the same aggregation on every column, without listing the columns?

天大地大妈咪最大 提交于 2019-12-29 00:25:07

问题


I have a table with N columns. Let's call them c1, c2, c3, c4, ... cN. Among multiple rows, I want to get a single row with COUNT DISTINCT(cX) for each X in [1, N].

c1 | c2 | ... | cn
0  | 4  | ... | 1

Is there a way I can do this (in a stored procedure) without writing every column name into the query manually?

Why?

We've had a problem where bugs in application servers mean we rewrite good column values with garbage inserted later. To solve this, I'm storing the information log-structure, where each row represents a logical UPDATE query. Then, when given a signal that the record is complete, I can determine if any values were (erroneously) overwritten.

An example of a single correct record in multiple rows: there is at most one value for each column.

| id | initialize_time | start_time | end_time |
| 1  | 12:00am         | NULL       | NULL     |
| 1  | 12:00am         | 1:00pm     | NULL     |
| 1  | 12:00am         | NULL       | 2:00pm   |

Reconciled row:
| 1  | 12:00am         | 1:00pm     | 2:00pm   |

An example of an irreconcilable record that I want to detect:

| id | initialize_time | start_time | end_time |
| 1  | 12:00am         | NULL       | NULL     |
| 1  | 12:00am         | 1:00pm     | NULL     |
| 1  | 9:00am          | 1:00pm     | 2:00pm   |   -- New initialize time => irreconcilable!

回答1:


You need dynamic SQL for that, which means you have to create a function or run a DO command. Since you cannot return values directly from the latter, a plpgsql function it is:

CREATE OR REPLACE function f_count_all(_tbl text
                           , OUT columns text[], OUT counts bigint[])
  RETURNS record LANGUAGE plpgsql AS
$func$
BEGIN

EXECUTE (
    SELECT 'SELECT
     ARRAY[' || string_agg('''' || quote_ident(attname) || '''', ', ') || '], 
     ARRAY[' || string_agg('count(' || quote_ident(attname) || ')', ', ') || ']
    FROM ' || _tbl
    FROM   pg_attribute
    WHERE  attrelid = _tbl::regclass
    AND    attnum  >= 1           -- exclude tableoid & friends (neg. attnum)
    AND    attisdropped is FALSE  -- exclude deleted columns
    GROUP  BY attrelid
    )
INTO columns, counts;

END
$func$;

Call:

SELECT * FROM f_count_all('myschema.mytable');

Returns:

columns       | counts
--------------+--------
{c1, c2, c3,} | {17 1,0}

More explanation and links about dynamic SQL and EXECUTE in this related question - or a couple more here on SO, try this serach.

Very similar to this question:
postgresql - count (no null values) of each column in a table

You could even try and return a polymorphic record type to get single columns dynamically, but that's rather complex and advanced. Probably too much effort for your case. More in this related answer.



来源:https://stackoverflow.com/questions/13760230/how-to-perform-the-same-aggregation-on-every-column-without-listing-the-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!