I just read a post mentioning \"full text search\" in SQL.
I was just wondering what the difference between FTS and LIKE are. I did read a couple of articles but
The real difference is the scanning methodologies. For full-text search, the words (terms) are used as hash keys - each of which is associated with an array of documents the keys (terms) appears in. Its like this:
Document sets = {d1, d2, d3, d4, ... dn}
Term sets = {t1, t2, t3, .. tn}
Now term-document matrix (which term member of which document) can be represented as:
t1 -> {d1, d5, d9,.. dn}
t2 -> {d11, d50, d2,.. dn}
t3 -> {d23, d67, d34,.. dn}
:
tn -> {d90, d87, d57,.. dn}
When the request comes in asking for "Get me all documents containing the word/term t1" - then the document set {d1, d5, d9,.. dn} is returned.
You could hack a de-normalized table schema to store documents - each row in MySQL table will be considered as "document" and a TEXT column could contain a paragraph etc. The inverted index will contain the terms as hash keys and the row-ids as the document ids.
Remember that this SQL query will have more or less O(1) performance. The query will be independent of
For instance this SQL could be fired to extract all rows matching the given word XYZ:
SELECT *
FROM my_table
WHERE MATCH (my_text_column) against ('XYZ' IN boolean mode) ;
Caveat: If you add ORDER BY to this query, your runtimes will vary based on the several parameters, one of which is the number of matching rows/documents. So beware.
The LIKE however has got nothing of this. It is forced to linearly scan the sentence/string and find all matching terms. Adding wild card adds to the mess. It works great for small length strings, as you can imagine, but will fail miserably for longer sentences. And definitely not comparable when having a paragraph or a whole page of text etc.