问题
I need help crafting an advanced Postgres query. I am trying to find sentences with two words adjacent to each other, using Postgres directly, not some command language extension. My tables are:
TABLE word (spelling text, wordid serial)
TABLE sentence (sentenceid serial)
TABLE item (sentenceid integer, position smallint, wordid integer)
I have a simple query to find sentences with a single word:
SELECT DISTINCT sentence.sentenceid
FROM item,word,sentence
WHERE word.spelling = 'word1'
AND item.wordid = word.wordid
AND sentence.sentenceid = item.sentenceid
I want to filter the results of that query in turn by some other word (word2) whose corresponding item has an item.sentenceid equal to the current query result's (item or sentence)'s sentenceid and where item.position is equal to the current query result's item.position + 1. How can I refine my query to achieve this goal and in a performant manner?
回答1:
Simpler solution, but only gives results, when there are no gaps in item.position
s:
SELECT DISTINCT sentence.sentenceid
FROM sentence
JOIN item ON sentence.sentenceid = item.sentenceid
JOIN word ON item.wordid = word.wordid
JOIN item AS next_item ON sentence.sentenceid = next_item.sentenceid
AND next_item.position = item.position + 1
JOIN word AS next_word ON next_item.wordid = next_word.wordid
WHERE word.spelling = 'word1'
AND next_word.spelling = 'word2'
More general solution, using window functions:
SELECT DISTINCT sentenceid
FROM (SELECT sentence.sentenceid,
word.spelling,
lead(word.spelling) OVER (PARTITION BY sentence.sentenceid
ORDER BY item.position)
FROM sentence
JOIN item ON sentence.sentenceid = item.sentenceid
JOIN word ON item.wordid = word.wordid) AS pairs
WHERE spelling = 'word1'
AND lead = 'word2'
Edit: Also general solution (gaps allowed), but with joins only:
SELECT DISTINCT sentence.sentenceid
FROM sentence
JOIN item ON sentence.sentenceid = item.sentenceid
JOIN word ON item.wordid = word.wordid
JOIN item AS next_item ON sentence.sentenceid = next_item.sentenceid
AND next_item.position > item.position
JOIN word AS next_word ON next_item.wordid = next_word.wordid
LEFT JOIN item AS mediate_word ON sentence.sentenceid = mediate_word.sentenceid
AND mediate_word.position > item.position
AND mediate_word.position < next_item.position
WHERE mediate_word.wordid IS NULL
AND word.spelling = 'word1'
AND next_word.spelling = 'word2'
回答2:
I think this will match your requirements, sorry but i did not remember right now how to write it without using join clauses. Basicly i included a self join to the items and words table to get the next item on sentence for each item. If the query planner does not like much my nested select you can try to left join the words table too.
SELECT distinct sentence.sentenceid
FROM item inner join word
on item.wordid = word.wordid
inner join sentence
on sentence.sentenceid = item.sentenceid
left join (select sentence.sentenceid,
item.position,
word.spelling from subsequent_item
inner join subsequent_word
on item.wordid = word.wordid) subsequent
on subsequent.sentenceid = item.sentenceid
and subsequent.position = item.position +1
where word.spelling = 'word1' and subsequent.spelling = 'word2';
回答3:
select
*
from mytable
where
round( 0.1 / ts_rank_cd( to_tsvector(mycolumn), to_tsquery('word1 & word2') ) <= 1
This will actually work, assuming you're not using A-D weight labels, else you'll need to change the 0.1 to something else.
You'll want to add a tsvector @@ tsquery where clause too.
来源:https://stackoverflow.com/questions/23644944/find-sentences-with-two-words-adjacent-to-each-other-in-pg