Find sentences with two words adjacent to each other in Pg

问题

I need help crafting an advanced Postgres query. I am trying to find sentences with two words adjacent to each other, using Postgres directly, not some command language extension. My tables are:

TABLE word (spelling text, wordid serial)
TABLE sentence (sentenceid serial)
TABLE item (sentenceid integer, position smallint, wordid integer)

I have a simple query to find sentences with a single word:

SELECT DISTINCT sentence.sentenceid 
FROM item,word,sentence 
WHERE word.spelling = 'word1' 
  AND item.wordid = word.wordid 
  AND sentence.sentenceid = item.sentenceid

I want to filter the results of that query in turn by some other word (word2) whose corresponding item has an item.sentenceid equal to the current query result's (item or sentence)'s sentenceid and where item.position is equal to the current query result's item.position + 1. How can I refine my query to achieve this goal and in a performant manner?

回答1:

Simpler solution, but only gives results, when there are no gaps in item.positions:

SELECT DISTINCT sentence.sentenceid 
  FROM sentence 
  JOIN item ON sentence.sentenceid = item.sentenceid
  JOIN word ON item.wordid = word.wordid
  JOIN item AS next_item ON sentence.sentenceid = next_item.sentenceid
                        AND next_item.position = item.position + 1
  JOIN word AS next_word ON next_item.wordid = next_word.wordid
 WHERE word.spelling = 'word1'
   AND next_word.spelling = 'word2'

More general solution, using window functions:

SELECT DISTINCT sentenceid
FROM (SELECT sentence.sentenceid,
             word.spelling,
             lead(word.spelling) OVER (PARTITION BY sentence.sentenceid
                                           ORDER BY item.position)
        FROM sentence 
        JOIN item ON sentence.sentenceid = item.sentenceid
        JOIN word ON item.wordid = word.wordid) AS pairs
 WHERE spelling = 'word1'
   AND lead = 'word2'

Edit: Also general solution (gaps allowed), but with joins only:

SELECT DISTINCT sentence.sentenceid
  FROM sentence 
  JOIN item ON sentence.sentenceid = item.sentenceid
  JOIN word ON item.wordid = word.wordid
  JOIN item AS next_item ON sentence.sentenceid = next_item.sentenceid
                        AND next_item.position > item.position
  JOIN word AS next_word ON next_item.wordid = next_word.wordid
  LEFT JOIN item AS mediate_word ON sentence.sentenceid = mediate_word.sentenceid
                                AND mediate_word.position > item.position
                                AND mediate_word.position < next_item.position
 WHERE mediate_word.wordid IS NULL
   AND word.spelling = 'word1'
   AND next_word.spelling = 'word2'

回答2:

I think this will match your requirements, sorry but i did not remember right now how to write it without using join clauses. Basicly i included a self join to the items and words table to get the next item on sentence for each item. If the query planner does not like much my nested select you can try to left join the words table too.

SELECT distinct sentence.sentenceid 
FROM item inner join word 
        on item.wordid = word.wordid
    inner join sentence
        on sentence.sentenceid = item.sentenceid 
    left join (select sentence.sentenceid,
                                item.position,
                                word.spelling from subsequent_item 
                    inner join subsequent_word 
                        on item.wordid = word.wordid) subsequent
        on subsequent.sentenceid = item.sentenceid
            and subsequent.position = item.position +1
where   word.spelling = 'word1' and subsequent.spelling = 'word2';

回答3:

select
  *
from mytable
where
  round( 0.1 / ts_rank_cd( to_tsvector(mycolumn), to_tsquery('word1 & word2') ) <= 1

This will actually work, assuming you're not using A-D weight labels, else you'll need to change the 0.1 to something else.

You'll want to add a tsvector @@ tsquery where clause too.

来源：https://stackoverflow.com/questions/23644944/find-sentences-with-two-words-adjacent-to-each-other-in-pg

标签

sql

postgresql

full-text-search