PostgreSQL - GROUP BY clause

问题

I want to search by tags, and then list all articles with that tag, and also how many of given tags they match. So for example I might have:

 Page1 - 2 (has css and php tag)
 Page2 - 1 (has only css tag)

Query:

SELECT COUNT(t.tag)
FROM a_tags t
JOIN w_articles2tag a2t ON a2t.tag = t.id 
JOIN w_article a ON a.id = a2t.article 
WHERE t.tag = 'css' OR t.tag = 'php'
GROUP BY t.tag
LIMIT 9

When I only put COUNT(t.tag) the query works, and I get okay results. But if I append e.g. ID of my article I get following error:

ERROR: column "a.title" must appear in the GROUP BY clause or be used in an aggregate function LINE 1: SELECT COUNT(t.tag), a.title FROM a_tags t

How to add said columns to this query?

回答1:

First, to clarify, Postgres 9.1 or later (quoting release notes of 9.1) ...

Allow non-GROUP BY columns in the query target list when the primary key is specified in the GROUP BY clause (Peter Eisentraut)

More in this related answer:
Return a grouped list with occurrences using Rails and PostgreSQL

Next, the queries in the question and in @Michael's answer have got the logic backwards. We want to count how many tags match per article, not how many articles have a certain tag. So we need to GROUP BY w_article.id, not by a_tags.id.

list all articles with that tag, and also how many of given tags they match

To fix this:

SELECT COUNT(t.tag) AS ct, a.* -- any column from a allowed ...
FROM   a_tags         t
JOIN   w_articles2tag a2t ON a2t.tag = t.id 
JOIN   w_article      a   ON a.id = a2t.article 
WHERE  t.tag IN ('css', 'php')
GROUP  BY a.id           -- ... since grouped by pk column of a
LIMIT  9

Assuming id is the primary key of w_article.
However, this form will be faster while doing the same:

SELECT a.*, ct
FROM  (
   SELECT a2t.article AS id, COUNT(*) AS ct
   FROM   a_tags         t
   JOIN   w_articles2tag a2t ON a2t.tag = t.id 
   GROUP  BY a.article 
   LIMIT  9      -- LIMIT early - cheaper
   ) sub
JOIN   w_article a USING (id);  -- attached alias to article in the sub

More in this closely related answer from just yesterday:
Why does the following join increase the query time significantly?

As an aside: It is an anti-pattern to use the generic, non-descriptive id as column name. Call it article_id etc. in both tables. Easier to join and you don't have to use aliases in queries all the time.

回答2:

When you use a "GROUP BY" clause, you need to enclose all columns that are not grouped in an aggregate function. Try adding title to the GROUP BY list, or selecting "min(a.title)" instead.

SELECT COUNT(t.tag), a.title FROM a_tags t
JOIN w_articles2tag a2t ON a2t.tag = t.id 
JOIN w_article a ON a.id = a2t.article 
WHERE t.tag = 'css' OR t.tag = 'php' GROUP BY t.tag, a.title LIMIT 9

来源：https://stackoverflow.com/questions/18991625/postgresql-group-by-clause

标签

sql

postgresql

group-by

aggregate-functions