问题
I need to perform Contains operation in a column. For Contains operation we need use Wildcard before and after a word.
Ex: personalized
Query -> like '%sonal%'
As this type of query can't use indexes. Is there any way to increase the speed of the search.
Note: I use MySql(InnoDB) and PSQL
回答1:
PostgreSQL has solution - trigram index. Here is a article or documentation
postgres=# create extension pg_trgm ;
CREATE EXTENSION
postgres=# create index on obce using gin (nazev gin_trgm_ops);
CREATE INDEX
postgres=# explain select * from obce where nazev like '%Bene%';
┌──────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
╞══════════════════════════════════════════════════════════════════════════════╡
│ Bitmap Heap Scan on obce (cost=20.00..24.02 rows=1 width=41) │
│ Recheck Cond: ((nazev)::text ~~ '%Bene%'::text) │
│ -> Bitmap Index Scan on obce_nazev_idx (cost=0.00..20.00 rows=1 width=0) │
│ Index Cond: ((nazev)::text ~~ '%Bene%'::text) │
└──────────────────────────────────────────────────────────────────────────────┘
(4 rows)
It is working for regular expressions too.
回答2:
MySQL supports FULLTEXT indexes.
You might be interested in my presentation Full Text Search Throwdown, in which I compare different fulltext indexing tools. The presentation is a bit old now, but some of it is still relevant.
Re your comments:
MySQL's fulltext indexing doesn't support partial word matches, although it supports a limited wildcard, but only at the end of patterns. And the InnoDB implementation of fulltext doesn't support it, only the MyISAM does. See mention of the * wildcard in https://dev.mysql.com/doc/refman/5.7/en/fulltext-boolean.html
SELECT ... WHERE MATCH(mycolumn) AGAINST ('stack*' IN BOOLEAN MODE)
Elastic Search also support wildcards, but like MySQL, they aren't efficient if your wildcard is at the start of the pattern. See https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-wildcard-query.html
Sphinx Search supports an option for infix string indexing. If you set min_infix_len to a nonzero positive number, it will index all infix substrings as well as whole words.
See http://sphinxsearch.com/docs/current.html#conf-min-infix-len
回答3:
I guess there is no logical optimization if you need to find any sequence of char at any position of the values. If the current lookup takes several seconds then maybe you could benefit from using an external optimized index like this:
Add 2 extra columns:
offsetwith an index andlengthwithout index.Join all the values in a single text file and save on each row the offset and the length.
Write an external tool to do the search in the whole file (using something like
strstr()) and return the offset.Use the return offset to identify the row with something like
SELECT TOP 1 FROM table WHERE offset < @offset ORDER BY offset DESC.Use the
lengthfield to make sure the matched fragment does no lays between records:@offset + @length(the end of the searched string) is<= offset + length(the end of the value on the found row).
You could also keep the full joined text in a global variable or dedicated table inside the database, to avoid spawning an external process or accessing the disk.
来源:https://stackoverflow.com/questions/49809452/wild-card-before-and-after-a-string-mysql-psql