What is Full Text Search vs LIKE

后端 未结 6 1142
南旧
南旧 2020-11-30 17:08

I just read a post mentioning \"full text search\" in SQL.

I was just wondering what the difference between FTS and LIKE are. I did read a couple of articles but

6条回答
  •  我在风中等你
    2020-11-30 17:28

    The real difference is the scanning methodologies. For full-text search, the words (terms) are used as hash keys - each of which is associated with an array of documents the keys (terms) appears in. Its like this:

    Document sets = {d1, d2, d3, d4, ... dn}
    Term sets = {t1, t2, t3, .. tn}
    

    Now term-document matrix (which term member of which document) can be represented as:

    t1 -> {d1, d5, d9,.. dn}
    t2 -> {d11, d50, d2,.. dn}
    t3 -> {d23, d67, d34,.. dn}
    :
    tn -> {d90, d87, d57,.. dn}
    

    When the request comes in asking for "Get me all documents containing the word/term t1" - then the document set {d1, d5, d9,.. dn} is returned.

    You could hack a de-normalized table schema to store documents - each row in MySQL table will be considered as "document" and a TEXT column could contain a paragraph etc. The inverted index will contain the terms as hash keys and the row-ids as the document ids.

    Remember that this SQL query will have more or less O(1) performance. The query will be independent of

    1. Number of words/terms in the TEXT column
    2. The number of rows/documents matching the criteria
    3. The length of the words/terms

    For instance this SQL could be fired to extract all rows matching the given word XYZ:

    SELECT * 
    FROM   my_table 
    WHERE  MATCH (my_text_column) against ('XYZ' IN boolean mode) ;
    

    Caveat: If you add ORDER BY to this query, your runtimes will vary based on the several parameters, one of which is the number of matching rows/documents. So beware.

    The LIKE however has got nothing of this. It is forced to linearly scan the sentence/string and find all matching terms. Adding wild card adds to the mess. It works great for small length strings, as you can imagine, but will fail miserably for longer sentences. And definitely not comparable when having a paragraph or a whole page of text etc.

提交回复
热议问题