Lucene query: bla~* (match words that start with something fuzzy), how?

前端 未结 4 1414
旧时难觅i
旧时难觅i 2020-12-05 15:59

In the Lucene query syntax I\'d like to combine * and ~ in a valid query similar to: bla~* //invalid query

Meaning: Please match words that begin with \"bla\" or so

4条回答
  •  醉话见心
    2020-12-05 16:51

    It's for an address search service, where I want to suggest addresses based on partially typed and possibly mistyped streetnames/citynames/etc (any combination). (think ajax, users typing partial street addresses in a text field)

    For this case the suggested query expansion is perhaps not so feasible, as the partial string (street address) may become longer than "short" :)

    Normalization

    One possibility I can think of is to use string "normalization", instead of fuzzy searches, and simply combine that with wildcard queries. A street address of

    "miklabraut 42, 101 reykjavík", would become "miklabrat 42 101 rekavik" when normalized.

    So, building index like this:

    1) build the index with records containing "normalized" versions of street names, city names etc, with one street address per document (1 or several fields).

    And search the index like this:

    2) Normalize inputstrings (e.g. mikl reyk) used to form the queries (i.e. mik rek). 3) use the wildcard op to perform the search (i.e. mik* AND rek*), leaving the fuzzy part out.

    That would fly, provided the normalization algorithm is good enough :)

提交回复
热议问题