问题
This is a followup to this question. I seem to have come across an edge case and I don't understand why I'm getting the wrong results. Using the data from the linked question, I can group them into combinations that use the same album, src, and background.
For instance, using this data:
CREATE TABLE reports (rep_id int primary key, data json);
INSERT INTO reports (rep_id, data)
VALUES
(1, '{"objects":[{"album": 1, "src":"fooA.png", "pos": "top"}, {"album": 2, "src":"barB.png", "pos": "top"}], "background":"background.png"}'),
(2, '{"objects":[{"album": 1, "src":"fooA.png", "pos": "top"}, {"album": 2, "src":"barC.png", "pos": "top"}], "background":"background.png"}'),
(3, '{"objects":[{"album": 1, "src":"fooA.png", "pos": "middle"},{"album": 2, "src":"barB.png", "pos": "middle"}],"background":"background.png"}'),
(4, '{"objects":[{"album": 1, "src":"fooA.png", "pos": "top"}, {"album": 3, "src":"barB.png", "pos": "top"}], "background":"backgroundA.png"}')
;
and this is the query:
SELECT distinct array_agg(distinct r.rep_id) AS ids, count(*) AS ct
FROM reports r
, json_array_elements(r.data->'objects') o
GROUP BY r.data->>'background'
, o->>'album'
, o->>'src'
ORDER BY count(*) DESC
LIMIT 5;
I get these results, which are incorrect:
ids | ct
---------+----
{1,2,3} | 3
{1,3} | 2
{2} | 1
{4} | 1
What I want is this
ids | ct
---------+----
{1,3} | 2
{2} | 1
{4} | 1
If I change the background values so that they are varied, then it does work as expected but the counts are still off. So what I'm gather is the grouping by background may be a cause for the issue. But I don't know why. I can do without the counts, I just mainly need the ids grouped for matching combinations that use the same file, album, and background.
Edit I had to edit my question. It turns out my sample data had an error and I was never getting the correct results. So I am looking for a query that works if possible.
回答1:
A kind person from Postgresql's IRC channel helped find the answer and craft the correct query. The credit is actually his, not mine.
He helped realize that the albums and srcs should be added to arrays for comparison. For instance:
SELECT array_agg(rep_id), count(*) AS ct
FROM (SELECT rep_id,
data->>'background' as background,
array_agg(o->>'album' order by o->>'album') as albums,
array_agg(o->>'src' order by o->>'album') as srcs
FROM reports r,
json_array_elements(r.data->'objects') o
GROUP BY rep_id) s
GROUP BY background, albums, srcs
ORDER BY count(*) DESC
LIMIT 5;
I don't know if this is the best way of doing it but it works. Suggestions are welcome.
回答2:
First of all, you have a typo, change 'scr' to 'src'. But your query is correct, just take a look at your query without grouping:
select
r.rep_id, r.data->>'background' as background, o->>'album' as album, o->>'src' as src
from reports r, json_array_elements(r.data->'objects') o;
------------------------------------------------------------
REP_ID BACKGROUND ALBUM SRC
1 background.png 1 fooA.png
1 background.png 2 barB.png
2 background.png 2 barB.png
2 background.png 2 barB.png
回答3:
If you count distinct on rep_id you will get the number of rows where a unique combination occurred.
SELECT distinct array_agg(distinct r.rep_id) AS ids, count(distinct r.rep_id) AS ct, array[r.data->>'background', o->>'album', o->>'src'] as combination
FROM reports r
, json_array_elements(r.data->'objects') o
GROUP BY r.data->>'background'
, o->>'album'
, o->>'src'
ORDER BY 2 DESC
Result on first dataset:
ids ct combination
{1,2,3} 3 {background.png,1,fooA.png}
{1,3} 2 {background.png,2,barB.png}
{2} 1 {background.png,2,barC.png}
{4} 1 {backgroundA.png,1,fooA.png}
{4} 1 {backgroundA.png,3,barB.png}
Reult on second dataset:
ids ct combination
{1,2} 2 {background.png,2,barB.png}
{1} 1 {background.png,1,fooA.png}
来源:https://stackoverflow.com/questions/27848171/querying-combinations-of-json-returns-odd-results