Count matches between multiple columns and words in a nested array

假如想象 提交于 2019-12-20 06:37:03

问题


My earlier question was resolved. Now I need to develop a related, but more complex query.

I have a table like this:

id     description          additional_info
-------------------------------------------
123    games                XYD
124    Festivals sport      swim

And I need to count matches to arrays like this:

array_content varchar[] := {"Festivals,games","sport,swim"}

If either of the columns description and additional_info contains any of the tags separated by a comma, we count that as 1. So each array element (consisting of multiple words) can only contribute 1 to the total count.

The result for the above example should be:

id    RID    Matches
1     123    1
2     124    2

回答1:


The answer isn't simple, but figuring out what you are asking was harder:

SELECT row_number() OVER (ORDER BY t.id) AS id
     , t.id AS "RID"
     , count(DISTINCT a.ord) AS "Matches"
FROM   tbl t
LEFT   JOIN (
   unnest(array_content) WITH ORDINALITY x(elem, ord)
   CROSS JOIN LATERAL
   unnest(string_to_array(elem, ',')) txt
   ) a ON t.description ~ a.txt
       OR t.additional_info ~ a.txt
GROUP  BY t.id;

Produces your desired result exactly.
array_content is your array of search terms.

How does this work?

Each array element of the outer array in your search term is a comma-separated list. Decompose the odd construct by unnesting twice (after transforming each element of the outer array into another array). Example:

SELECT *
FROM   unnest('{"Festivals,games","sport,swim"}'::varchar[]) WITH ORDINALITY x(elem, ord)
CROSS  JOIN LATERAL
       unnest(string_to_array(elem, ',')) txt;

Result:

 elem            | ord |  txt
-----------------+-----+------------
 Festivals,games | 1   | Festivals
 Festivals,games | 1   | games
 sport,swim      | 2   | sport
 sport,swim      | 2   | swim

Since you want to count matches for each outer array element once, we generate a unique number on the fly with WITH ORDINALITY. Details:

  • PostgreSQL unnest() with element number

Now we can LEFT JOIN to this derived table on the condition of a desired match:

   ... ON t.description ~ a.txt
       OR t.additional_info ~ a.txt

.. and get the count with count(DISTINCT a.ord), counting each array only once even if multiple search terms match.

Finally, I added the mysterious id in your result with row_number() OVER (ORDER BY t.id) AS id - assuming it's supposed to be a serial number. Voilá.

The same considerations for regular expression matches (~) as in your previous question apply:

  • Postgres query to calculate matching strings


来源:https://stackoverflow.com/questions/40383958/count-matches-between-multiple-columns-and-words-in-a-nested-array

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!