Lucene query: bla~* (match words that start with something fuzzy), how?

前端 未结 4 1415
旧时难觅i
旧时难觅i 2020-12-05 15:59

In the Lucene query syntax I\'d like to combine * and ~ in a valid query similar to: bla~* //invalid query

Meaning: Please match words that begin with \"bla\" or so

4条回答
  •  渐次进展
    2020-12-05 16:45

    in the development trunk of lucene (not yet a release), there is code to support use cases like this, via AutomatonQuery. Warning: the APIs might/will change before its released, but it gives you the idea.

    Here is an example for your case:

    // a term representative of the query, containing the field. 
    // the term text is not so important and only used for toString() and such
    Term term = new Term("yourfield", "bla~*");
    
    // builds a DFA that accepts all strings within an edit distance of 2 from "bla"
    Automaton fuzzy = new LevenshteinAutomata("bla").toAutomaton(2);
    
    // concatenate this DFA with another DFA equivalent to the "*" operator
    Automaton fuzzyPrefix = BasicOperations.concatenate(fuzzy, BasicAutomata.makeAnyString());
    
    // build a query, search with it to get results.
    AutomatonQuery query = new AutomatonQuery(term, fuzzyPrefix);
    

提交回复
热议问题