Work around of Android SQLite full-text search for Asian text

无人久伴 提交于 2019-12-11 21:33:05

问题


I have read about many posts asking whether the SQLite based full-text search can be done in Android, and all the answers point out that the built-in SQLite of Android does not allow custom tokenizer. The default tokenizer considers the words separated by space or other signs, but Asian words (like Chinese) need its special tokenizer, but Android does not allow adding custom one.

The posts I read were years ago. Is there any update in recent Android versions? I just searched and did not find an answer.

And I am thinking a work-around. Is it feasible that before I INSERT tuples into the FTS3/FTS4 virtual table for indexing, I artificially add spaces between each word, so that the default tokenizer can consider each Asian "word" like an English word? When doing the query, the query string does the same, that artificial spaces are also added.

I tried in Linux, looks like it works. For example, if I do like this, full-text search is OK for Asian texts:

CREATE VIRTUAL TABLE mail USING fts3(subject, body);
INSERT INTO mail(docid, subject, body) VALUES(4, 'software feedback', '这 个 Bug 还 没 有 解 决');
SELECT * FROM mail WHERE body MATCH '没 有 解 决';  

But one doubt is that whether it would cost much more storage for the database file, as there are double of characters with the spaces. It looks like the so called "virtual table" not only stores the generated index, but also the original text.


回答1:


Use the NDK to compile your own copy of SQLite, with which you then can do whatever you want.




回答2:


For API Level 21 or up, I tested and found that ICU tokenizer is already available.

For older devices, I found a work-around solution in another question: Is SQLite on Android built with the ICU tokenizer enabled for FTS?



来源:https://stackoverflow.com/questions/33252049/work-around-of-android-sqlite-full-text-search-for-asian-text

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!