Query and order by number of matches in JSON array

♀尐吖头ヾ 提交于 2019-12-05 00:34:39

Your arrays contain only primitive values, nested documents would be more complicated.

Query

Unnest the JSON arrays of found rows with jsonb_array_elements_text() in a LATERAL join and count matches:

SELECT *
FROM  (
   SELECT *
   FROM   tbl
   WHERE  data->'tags' ?| ARRAY['foo', 'bar']
   ) t
, LATERAL (
   SELECT count(*) AS ct
   FROM   jsonb_array_elements_text(t.data->'tags') a(elem)
   WHERE  elem = ANY (ARRAY['foo', 'bar'])  -- same array parameter
   ) ct
ORDER  BY ct.ct DESC;  -- more expressions to break ties?

Alternative with INSTERSECT. It's one of the rare occasions that we can make use of this basic SQL feature:

SELECT *
FROM  (
   SELECT *
   FROM   tbl
   WHERE  data->'tags' ?| '{foo, bar}'::text[]  -- alt. syntax w. array
   ) t
, LATERAL (
   SELECT count(*) AS ct
   FROM  (
      SELECT * FROM jsonb_array_elements_text(t.data->'tags')
      INTERSECT ALL
      SELECT * FROM unnest('{foo, bar}'::text[])  -- same array literal
      ) i
   ) ct
ORDER  BY ct.ct DESC;

Note a subtle difference: This consumes each element when matched, so it does not count unmatched duplicates in data->'tags' like the first variant does. For details see the demo below.

Also demonstrating an alternative way to pass the array parameter: as array literal: '{foo, bar}'. This may be simpler to handle for some clients:

Or you could create a server side search function taking a VARIADIC parameter and pass a variable number of plain text values:

Related:

Index

Be sure to have a functional GIN index to support the jsonb existence operator ?|:

CREATE INDEX tbl_dat_gin ON tbl USING gin (data->'tags');

Nuances with duplicates

Clarification as per request in the comment. Say, we have a JSON array with two duplicated tags (4 total):

jsonb '{"tags": ["foo", "bar", "foo", "bar"]}'

And search with an SQL array parameter including both tags, one of them duplicated (3 total):

'{foo, bar, foo}'::text[]

Consider the results of this demo:

SELECT *
FROM  (SELECT jsonb '{"tags":["foo", "bar", "foo", "bar"]}') t(data)

, LATERAL (
   SELECT count(*) AS ct
   FROM   jsonb_array_elements_text(t.data->'tags') e
   WHERE  e = ANY ('{foo, bar, foo}'::text[])
   ) ct

, LATERAL (
   SELECT count(*) AS ct_intsct_all
   FROM  (
      SELECT * FROM jsonb_array_elements_text(t.data->'tags')
      INTERSECT ALL
      SELECT * FROM unnest('{foo, bar, foo}'::text[])
      ) i
   ) ct_intsct_all

, LATERAL (
   SELECT count(DISTINCT e) AS ct_dist
   FROM   jsonb_array_elements_text(t.data->'tags') e
   WHERE  e = ANY ('{foo, bar, foo}'::text[])
   ) ct_dist

, LATERAL (
   SELECT count(*) AS ct_intsct
   FROM  (
      SELECT * FROM jsonb_array_elements_text(t.data->'tags')
      INTERSECT
      SELECT * FROM unnest('{foo, bar, foo}'::text[])
      ) i
   ) ct_intsct;

Result:

data                                     | ct | ct_intsct_all | ct_dist | ct_intsct
-----------------------------------------+----+---------------+---------+----------
'{"tags": ["foo", "bar", "foo", "bar"]}' | 4  | 3             | 2       | 2

Comparing elements in the JSON array to elements in the array parameter:

  • 4 tags match any of the search elements: ct.
  • 3 tags in the set intersect (can be matched element-to-element): ct_intsct_all.
  • 2 distinct matching tags can be identified: ct_dist or ct_intsct.

If you don't have dupes or if you don't care to exclude them, use one of the first two techniques. The other two are a bit slower (besides the different result), because they have to check for dupes.

I'm posting details of my solution in Ruby, in case it's useful to anyone tackling the same issue.

In the end I decided a scope isn't appropriate as the method will return the an array of objects (not a chainable ActiveRecord::Relation), so I've written a class method and have provided a way to pass a chained scope to it through a block:

def self.with_any_tags(tags, &block)
  composed_scope = (
    block_given? ? yield : all
  ).where(["data->'tags' ?| ARRAY[:tags]", { tags: tags }])

  t   = Arel::Table.new('t',  ActiveRecord::Base)
  ct  = Arel::Table.new('ct', ActiveRecord::Base)

  arr_sql = Arel.sql "ARRAY[#{ tags.map { |t| Arel::Nodes::Quoted.new(t).to_sql }.join(', ') }]"
  any_tags_func = Arel::Nodes::NamedFunction.new('ANY', [arr_sql])

  lateral = ct
    .project(Arel.sql('e').count(true).as('ct'))
    .from(Arel.sql "jsonb_array_elements_text(t.data->'tags') e")
    .where(Arel::Nodes::Equality.new Arel.sql('e'), any_tags_func)

  query = t
    .project(t[Arel.star])
    .from(composed_scope.as('t'))
    .join(Arel.sql ", LATERAL (#{ lateral.to_sql }) ct")
    .order(ct[:ct].desc)

  find_by_sql query.to_sql
end

This can be used like so:

Page.with_any_tags(['foo', 'bar'])

# SELECT "t".*
# FROM (
#   SELECT "pages".* FROM "pages"
#   WHERE data->'tags' ?| ARRAY['foo','bar']
#   ) t,
# LATERAL (
#   SELECT COUNT(DISTINCT e) AS ct
#   FROM jsonb_array_elements_text(t.data->'tags') e
#   WHERE e = ANY(ARRAY['foo', 'bar'])
#   ) ct
# ORDER BY "ct"."ct" DESC

Page.with_any_tags(['foo', 'bar']) do
  Page.published
end

# SELECT "t".*
# FROM (
#   SELECT "pages".* FROM "pages"
#   WHERE pages.published_at <= '2015-07-19 15:11:59.997134'
#   AND pages.deleted_at IS NULL
#   AND data->'tags' ?| ARRAY['foo','bar']
#   ) t,
# LATERAL (
#   SELECT COUNT(DISTINCT e) AS ct
#   FROM jsonb_array_elements_text(t.data->'tags') e
#   WHERE e = ANY(ARRAY['foo', 'bar'])
#   ) ct
# ORDER BY "ct"."ct" DESC
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!