Locate popular strings with PostgreSQL

无人久伴 提交于 2020-05-11 02:57:19

问题


I have a bunch of text rows in a PostgreSQL table and I am trying to find common strings.

For example, let's say I have a basic table like:

CREATE TABLE a (id serial, value text);
INSERT INTO a (value) VALUES
    ('I go to the movie theater'), 
    ('New movie theater releases'), 
    ('Coming out this week at your local movie theater'),
    ('New exposition about learning disabilities at the children museum'),
    ('The genius found in learning disabilities')
;

I am trying to locate popular strings like movie theater and learning disabilities across all the rows (the goal is to show a list of "trending" strings king of like Twitter "Trends")

I use full text search and I have tried to use ts_stat combined with ts_headline but the results are quite disappointing.

Any thoughts? thanks!


回答1:


There is no ready-to-use Posgres text search feature to find most popular phrases. For two-words phrases you can use ts_stat() to find most popular words, eliminate particles, prepositions etc, and cross join these words to find most popular pairs.

For an actual data you would want to change values marked as --> parameter. The query may be quite expensive on a larger dataset.

with popular_words as (
    select word
    from ts_stat('select value::tsvector from a')
    where nentry > 1                                --> parameter
    and not word in ('to', 'the', 'at', 'in', 'a')  --> parameter
)
select concat_ws(' ', a1.word, a2.word) phrase, count(*) 
from popular_words as a1
cross join popular_words as a2
cross join a
where value ilike format('%%%s %s%%', a1.word, a2.word)
group by 1
having count(*) > 1                                 --> parameter
order by 2 desc;


        phrase         | count 
-----------------------+-------
 movie theater         |     3
 learning disabilities |     2
(2 rows)



回答2:


How about something like: SELECT * FROM a WHERE value LIKE '%movie theater%';

This would find rows which match the pattern 'movie theater' somewhere in the value column (and could include any number of characters before or after it).



来源:https://stackoverflow.com/questions/42702888/locate-popular-strings-with-postgresql

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!