SQL select rows containing substring in text field

淺唱寂寞╮ 提交于 2019-12-12 01:03:02

问题


I have CLIENTS_WORDS table with columns: ID, CLIENT_ID, WORD in Postgresql database

ID|CLIENT_ID|WORD
1 |1242     |word1
2 |1242     |WordX.foo
3 |1372     |nextword
4 |1999     |word1

In this table possible about 100k-500k rows.
I have query string like this:

'Some people tell word1 to someone'
'Another stringWordX.foo too possible'

I wish select * from table where WORD column text contains in query string.
Now I use select

select * from CLIENTS_WORDS
where strpos('Some people tell word1 to someone', WORD) > 0

My question, where is the best perfomance/fast way to retrieve matched rows?


回答1:


You get better performance with unnest() and JOIN. Like this:

SELECT DISTINCT c.client_id
FROM   unnest(string_to_array('Some people tell word1 ...', ' ')) AS t(word)
JOIN   clients_words c USING (word);

Details of the query depend on missing details of your requirements. This is splitting the string at space characters.

A more flexible tool would be regexp_split_to_table(), where you can use character classes or shorthands for your delimiter characters. Like:

regexp_split_to_table('Some people tell word1 to someone', '\s') AS t(word)
regexp_split_to_table('Some people tell word1 to someone', '\W') AS t(word)
  • Related answer: Django. PostgreSQL. regexp_split_to_table not working
  • A search for more answers for regular expression class shorthands.

Of course the column clients_words.word needs to be indexed for performance:

CREATE INDEX clients_words_word_idx ON clients_words (word)

Would be very fast.

Ignore word boundaries

If you want to ignore word boundaries altogether, the whole matter becomes much more expensive. LIKE / ILIKE in combination with a trigram GIN index would come to mind. Details here:
PostgreSQL LIKE query performance variations

Or other pattern-matching techniques - answer on dba.SE:
Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL

However, your case is backwards and the index is not going to help. You'll have to inspect every single row for a partial match - making queries very expensive. The superior approach is to reverse the operation: split words and then search.



来源:https://stackoverflow.com/questions/21832375/sql-select-rows-containing-substring-in-text-field

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!