What is Full Text Search vs LIKE

后端未结

关注

 6  1142

南旧 2020-11-30 17:08

I just read a post mentioning \"full text search\" in SQL.

I was just wondering what the difference between FTS and LIKE are. I did read a couple of articles but

6条回答

我在风中等你 (楼主)

2020-11-30 17:28
The real difference is the scanning methodologies. For full-text search, the words (terms) are used as hash keys - each of which is associated with an array of documents the keys (terms) appears in. Its like this:
```
Document sets = {d1, d2, d3, d4, ... dn}
Term sets = {t1, t2, t3, .. tn}
```
Now term-document matrix (which term member of which document) can be represented as:
```
t1 -> {d1, d5, d9,.. dn}
t2 -> {d11, d50, d2,.. dn}
t3 -> {d23, d67, d34,.. dn}
:
tn -> {d90, d87, d57,.. dn}
```
When the request comes in asking for "Get me all documents containing the word/term t1" - then the document set {d1, d5, d9,.. dn} is returned.

You could hack a de-normalized table schema to store documents - each row in MySQL table will be considered as "document" and a TEXT column could contain a paragraph etc. The inverted index will contain the terms as hash keys and the row-ids as the document ids.

Remember that this SQL query will have more or less O(1) performance. The query will be independent of
1. Number of words/terms in the TEXT column
2. The number of rows/documents matching the criteria
3. The length of the words/terms
For instance this SQL could be fired to extract all rows matching the given word XYZ:
```
SELECT * 
FROM   my_table 
WHERE  MATCH (my_text_column) against ('XYZ' IN boolean mode) ;
```
Caveat: If you add ORDER BY to this query, your runtimes will vary based on the several parameters, one of which is the number of matching rows/documents. So beware.

The LIKE however has got nothing of this. It is forced to linearly scan the sentence/string and find all matching terms. Adding wild card adds to the mess. It works great for small length strings, as you can imagine, but will fail miserably for longer sentences. And definitely not comparable when having a paragraph or a whole page of text etc.
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...