Wild Card Before and After a String - MySql, PSQL

南笙酒味 提交于 2019-12-10 15:56:00

问题


I need to perform Contains operation in a column. For Contains operation we need use Wildcard before and after a word.

Ex: personalized

Query -> like '%sonal%'

As this type of query can't use indexes. Is there any way to increase the speed of the search.

Note: I use MySql(InnoDB) and PSQL


回答1:


PostgreSQL has solution - trigram index. Here is a article or documentation

postgres=# create extension pg_trgm ;
CREATE EXTENSION
postgres=# create index on obce using gin (nazev gin_trgm_ops);
CREATE INDEX
postgres=# explain select * from obce where nazev like '%Bene%';
┌──────────────────────────────────────────────────────────────────────────────┐
│                                  QUERY PLAN                                             │
╞══════════════════════════════════════════════════════════════════════════════╡
│ Bitmap Heap Scan on obce  (cost=20.00..24.02 rows=1 width=41)                           │
│   Recheck Cond: ((nazev)::text ~~ '%Bene%'::text)                                       │
│   ->  Bitmap Index Scan on obce_nazev_idx  (cost=0.00..20.00 rows=1 width=0)            │
│         Index Cond: ((nazev)::text ~~ '%Bene%'::text)                                   │
└──────────────────────────────────────────────────────────────────────────────┘
(4 rows)

It is working for regular expressions too.




回答2:


MySQL supports FULLTEXT indexes.

You might be interested in my presentation Full Text Search Throwdown, in which I compare different fulltext indexing tools. The presentation is a bit old now, but some of it is still relevant.


Re your comments:

MySQL's fulltext indexing doesn't support partial word matches, although it supports a limited wildcard, but only at the end of patterns. And the InnoDB implementation of fulltext doesn't support it, only the MyISAM does. See mention of the * wildcard in https://dev.mysql.com/doc/refman/5.7/en/fulltext-boolean.html

SELECT ... WHERE MATCH(mycolumn) AGAINST ('stack*' IN BOOLEAN MODE)

Elastic Search also support wildcards, but like MySQL, they aren't efficient if your wildcard is at the start of the pattern. See https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-wildcard-query.html

Sphinx Search supports an option for infix string indexing. If you set min_infix_len to a nonzero positive number, it will index all infix substrings as well as whole words. See http://sphinxsearch.com/docs/current.html#conf-min-infix-len




回答3:


I guess there is no logical optimization if you need to find any sequence of char at any position of the values. If the current lookup takes several seconds then maybe you could benefit from using an external optimized index like this:

  1. Add 2 extra columns: offset with an index and length without index.

  2. Join all the values in a single text file and save on each row the offset and the length.

  3. Write an external tool to do the search in the whole file (using something like strstr()) and return the offset.

  4. Use the return offset to identify the row with something like SELECT TOP 1 FROM table WHERE offset < @offset ORDER BY offset DESC.

  5. Use the length field to make sure the matched fragment does no lays between records: @offset + @length (the end of the searched string) is <= offset + length (the end of the value on the found row).

You could also keep the full joined text in a global variable or dedicated table inside the database, to avoid spawning an external process or accessing the disk.



来源:https://stackoverflow.com/questions/49809452/wild-card-before-and-after-a-string-mysql-psql

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!