Postgres COUNT number of column values with INNER JOIN

问题

I am creating a report in Postgres 9.3. This is my SQL Fiddle.
Basically I have two tables, responses and questions, the structure is:

responses
->id
->question_id
->response

questions
->id
->question
->costperlead

for the column response there can only be 3 values, Yes/No/Possbily, and my report should have the columns:

  question_id
, # of Yes Responses
, # of No Responses
, # of Possbily Responses
, Revenue

Then:

# of Yes Responses - count of all Yes values in the response column
# of No Responses - count of all No values in the response column
# of Possbily Responses - count of all 'Possbily' values in the response column

Revenue is the costperlead * (Number of Yes Responses + Number of Possibly Responses).

I don't know how to construct the query, I'm new plus I came from MySQL so some things are different for postgres. In my SQL Fiddle sample most responses are Yes and Null, it's ok eventually, there will be Possibly and No.

So far I have only:

SELECT a.question_id
FROM responses a
INNER JOIN questions b ON a.question_id = b.id
WHERE a.created_at = '2015-07-17'
GROUP BY a.question_id;

回答1:

Since the only predicate filters responses, it would be most efficient to aggregate responses first, then join to questions:

SELECT *, q.costperlead * (r.ct_yes + r.ct_maybe) AS revenue
FROM  (
   SELECT question_id
        , count(*) FILTER (WHERE response = 'Yes')      AS ct_yes
        , count(*) FILTER (WHERE response = 'No')       AS ct_no
        , count(*) FILTER (WHERE response = 'Possibly') AS ct_maybe
   FROM   responses
   WHERE  created_at = '2015-07-17'
   GROUP  BY 1
   ) r
JOIN   questions q ON q.id = r.question_id;

This uses the new aggregate Filter clause of Postgres 9.4:

How can I simplify this game statistics query?

BTW, I would consider implementing response as boolean type with true/false/null.

For Postgres 9.3:

SELECT *, q.costperlead * (r.ct_yes + r.ct_maybe) AS revenue
FROM  (
   SELECT question_id
        , count(response = 'Yes' OR NULL)      AS ct_yes
        , count(response = 'No' OR NULL)       AS ct_no
        , count(response = 'Possibly' OR NULL) AS ct_maybe
   FROM   responses
   WHERE  created_at = '2015-07-17'
   GROUP  BY 1
   ) r
JOIN   questions q ON q.id = r.question_id;

SQL Fiddle (building on yours).

Here is a comparison of techniques, before the aggregate FILTER clause existed:

For absolute performance, is SUM faster or COUNT?

回答2:

You should try:

SELECT a.question_id, 
       SUM(CASE WHEN a.response = 'Yes' THEN 1 ELSE 0 END) AS NumsOfYes, 
       SUM(CASE WHEN a.response = 'No' THEN 1 ELSE 0 END) AS NumsOfNo,
       SUM(CASE WHEN a.response = 'Possibly' THEN 1 ELSE 0 END) AS NumOfPossibly,
       costperlead  * SUM(CASE WHEN a.response = 'Yes' THEN 1 ELSE 0 END) + SUM(CASE WHEN a.response = 'Possibly' THEN 1 ELSE 0 END) AS revenue
FROM responses a 
     INNER JOIN questions b ON a.question_id = b.id 
GROUP BY a.question_id, b.costperlead

来源：https://stackoverflow.com/questions/31514856/postgres-count-number-of-column-values-with-inner-join

标签

database

postgresql

count

inner-join

aggregate-filter