发表新帖

发表新帖

Lucene query: bla~* (match words that start with something fuzzy), how?

前端未结

关注

 4  1414

旧时难觅i 2020-12-05 15:59

In the Lucene query syntax I\'d like to combine * and ~ in a valid query similar to: bla~* //invalid query

Meaning: Please match words that begin with \"bla\" or so

4条回答

醉话见心 (楼主)

2020-12-05 16:51

It's for an address search service, where I want to suggest addresses based on partially typed and possibly mistyped streetnames/citynames/etc (any combination). (think ajax, users typing partial street addresses in a text field)

For this case the suggested query expansion is perhaps not so feasible, as the partial string (street address) may become longer than "short" :)

Normalization

One possibility I can think of is to use string "normalization", instead of fuzzy searches, and simply combine that with wildcard queries. A street address of

"miklabraut 42, 101 reykjavík", would become "miklabrat 42 101 rekavik" when normalized.

So, building index like this:

1) build the index with records containing "normalized" versions of street names, city names etc, with one street address per document (1 or several fields).

And search the index like this:

2) Normalize inputstrings (e.g. mikl reyk) used to form the queries (i.e. mik rek). 3) use the wildcard op to perform the search (i.e. mik* AND rek*), leaving the fuzzy part out.

That would fly, provided the normalization algorithm is good enough :)

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题