In the Lucene query syntax I\'d like to combine * and ~ in a valid query similar to: bla~* //invalid query
Meaning: Please match words that begin with \"bla\" or so
It's for an address search service, where I want to suggest addresses based on partially typed and possibly mistyped streetnames/citynames/etc (any combination). (think ajax, users typing partial street addresses in a text field)
For this case the suggested query expansion is perhaps not so feasible, as the partial string (street address) may become longer than "short" :)
Normalization
One possibility I can think of is to use string "normalization", instead of fuzzy searches, and simply combine that with wildcard queries. A street address of
"miklabraut 42, 101 reykjavík", would become "miklabrat 42 101 rekavik" when normalized.
So, building index like this:
1) build the index with records containing "normalized" versions of street names, city names etc, with one street address per document (1 or several fields).
And search the index like this:
2) Normalize inputstrings (e.g. mikl reyk) used to form the queries (i.e. mik rek).
3) use the wildcard op to perform the search (i.e. mik* AND rek*), leaving the fuzzy part out.
That would fly, provided the normalization algorithm is good enough :)