How to ignore accent in SQLite query (Android)

旧时模样 提交于 2019-11-26 08:46:30

Generally, string comparisons in SQL are controlled by column or expression COLLATE rules. In Android, only three collation sequences are pre-defined: BINARY (default), LOCALIZED and UNICODE. None of them is ideal for your use case, and the C API for installing new collation functions is unfortunately not exposed in the Java API.

To work around this:

  1. Add another column to your table, for example MOVIE_NAME_ASCII
  2. Store values into this column with the accent marks removed. You can remove accents by normalizing your strings to Unicode Normal Form D (NFD) and removing non-ASCII code points since NFD represents accented characters roughly as plain ASCII + combining accent markers:

    String asciiName = Normalizer.normalize(unicodeName, Normalizer.Form.NFD)
        .replaceAll("[^\\p{ASCII}]", "");
    
  3. Do your text searches on this ASCII-normalized column but display data from the original unicode column.

In Android sqlite, LIKE and GLOB ignore both COLLATE LOCALIZED and COLLATE UNICODE (they only work for ORDER BY). However, there is a solution without having to add extra columns to your table. As @asat explains in this answer, you can use GLOB with a pattern that will replace each letter with all the available alternatives of that letter. In Java:

public static String addTildeOptions(String searchText) {
    return searchText.toLowerCase()
                     .replaceAll("[aáàäâã]", "\\[aáàäâã\\]")
                     .replaceAll("[eéèëê]", "\\[eéèëê\\]")
                     .replaceAll("[iíìî]", "\\[iíìî\\]")
                     .replaceAll("[oóòöôõ]", "\\[oóòöôõ\\]")
                     .replaceAll("[uúùüû]", "\\[uúùüû\\]")
                     .replace("*", "[*]")
                     .replace("?", "[?]");
}

And then (not literally like this, of course):

SELECT * from table WHERE lower(column) GLOB "*addTildeOptions(searchText)*"

This way, for example in Spanish, a user searching for either mas or más will get the search converted into m[aáàäâã]s, returning both results.

It is important to notice that GLOB ignores COLLATE NOCASE, that's why I converted everything to lower case both in the function and in the query. Notice also that the lower() function in sqlite doesn't work on non-ASCII characters - but again those are probably the ones that you are already replacing!

The function also replaces both GLOB wildcards, * and ?, with "escaped" versions.

BCS Software

You can use Android NDK to recompile the SQLite source including the desired ICU (International Components for Unicode). Explained in russian here: http://habrahabr.ru/post/122408/

The process of compiling the SQLilte with source with ICU explained here:

How to compile sqlite with ICU?

Unfortunately you will end up with different APKs for different CPUs.

You need to look at these, not as accented characters, but as entirely different characters. You might as well be looking for a, b, or c. That being said, I would try using a regex for it. It would look something like:

SELECT * from TB_MOVIE WHERE MOVIE_NAME REGEXP '.*[aAàÀ].*' ORDER BY MOVIE_NAME;
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!