How do I make the QueryParser in Lucene handle numeric ranges?

吃可爱长大的小学妹 提交于 2019-12-18 03:42:19

问题


new QueryParser(.... ).parse (somequery);

it works only for string indexed fields. Say i have a field called count where count is a integer field (while indexing the field I considered the data type)

new QueryParser(....).parse("count:[1 TO 10]");

The above one is not works. Instead If i used "NumericRangeQuery.newIntRange" which is working. But, i need the above one only...


回答1:


Had the same issue and solved it, so here I share my solution:

To create a custom query parser that will parse the following query "INTFIELD_NAME:1203" or "INTFIELD_NAME:[1 TO 10]" and handle the field INTFIELD_NAME as an Intfield, I overrided newTermQuery with the following:

public class CustomQueryParser extends QueryParser {

public CustomQueryParser(String f, Analyzer a) {
    super(f, a);
}

protected Query newRangeQuery(String field, String part1, String part2, boolean startInclusive,
    boolean endInclusive) {

    if (INTFIELD_NAME.equals(field)) {
    return NumericRangeQuery.newIntRange(field, Integer.parseInt(part1), Integer.parseInt(part2),
        startInclusive, endInclusive);
    }
    return (TermRangeQuery) super.newRangeQuery(field, part1, part2, startInclusive, endInclusive);
}


protected Query newTermQuery(Term term) {
    if (INTFIELD_NAME.equals(term.field())) {

    BytesRefBuilder byteRefBuilder = new BytesRefBuilder();
    NumericUtils.intToPrefixCoded(Integer.parseInt(term.text()), 0, byteRefBuilder);
    TermQuery tq = new TermQuery(new Term(term.field(), byteRefBuilder.get()));

    return tq;
    } 
    return super.newTermQuery(term);

}
}

I took the code quoted in that thread from http://www.mail-archive.com/search?l=java-user@lucene.apache.org&q=subject:%22Re%3A+How+do+you+properly+use+NumericField%22&o=newest&f=1 and made 3 modifications :

  • rewrote newRangeQuery a little more nicely

  • replaced in newTermQuery method NumericUtils.intToPrefixCoded(Integer.parseInt(term.text()),NumericUtils.PRECISION_STEP_DEFAULT)));

    by NumericUtils.intToPrefixCoded(Integer.parseInt(term.text()), 0, byteRefBuilder);

when I used this method for the first time in a filter on the same numeric field, I put 0 as I found it as a default value in a lucene class and it just worked.

  • replaced on newTermQuery

    TermQuery tq = new TermQuery(new Term(field,

by TermQuery tq = new TermQuery(new Term(term.field(),

using "field" is wrong, because if your query has several clauses (FIELD:text OR INTFIELD:100), it is taking the first or previous clause field.




回答2:


You need to inherit from QueryParser and override GetRangeQuery(string field, ...). If field is one of your numeric field names, return an instance of NumericRangeQuery, otherwise return base.GetRangeQuery(...).

There is an example of such an implementation in this thread: http://www.mail-archive.com/java-user@lucene.apache.org/msg29062.html




回答3:


QueryParser won't create a NumericRangeQuery as it has no way to know whether a field was indexed with NumericField. Just extend the QueryParser to handle this case.




回答4:


In Lucene 6, the protected method QueryParser#getRangeQuery still exists with the argument list (String fieldName, String low, String high, boolean startInclusive, boolean endInclusive), and overriding it to interpret the range as a numeric range is indeed possible, as long as that information is indexed using one of the new Point fields.

When indexing your field:

document.add(new FloatPoint("_point_count", value)); // index for efficient range based retrieval
document.add(new StoredField("count", value)); // if you need to store the value itself

At your custom query parser (extending queryparser.classic.QueryParser), override the method with something like this:

@Override
protected Query getRangeQuery(String field, String low, String high, boolean startInclusive, boolean endInclusive) throws ParseException
{
    if («isNumericField»(field)) // context dependent
    {
        final String pointField = "_point_" + field;
        return FloatPoint.newRangeQuery(pointField,
                Float.parseFloat(low),
                Float.parseFloat(high));
    }

    return super.getRangeQuery(field, low, high, startInclusive, endInclusive);
}



回答5:


I adapted Jeremies answer for C# and Lucene.Net 3.0.3. I also needed type double instead of int. This is my code:

using System.Globalization;
using Lucene.Net.Analysis;
using Lucene.Net.Index;
using Lucene.Net.QueryParsers;
using Lucene.Net.Search;
using Lucene.Net.Util;
using Version = Lucene.Net.Util.Version;

namespace SearchServer.SearchEngine
{
    internal class SearchQueryParser : QueryParser
    {
        public SearchQueryParser(Analyzer analyzer)
            : base(Version.LUCENE_30, null, analyzer)
        {
        }

        private const NumberStyles DblNumberStyles = NumberStyles.AllowLeadingWhite | NumberStyles.AllowTrailingWhite | NumberStyles.AllowLeadingSign | NumberStyles.AllowDecimalPoint;

        protected override Query NewRangeQuery(string field, string part1, string part2, bool inclusive)
        {
            if (field == "p")
            {
                double part1Dbl;
                if (!double.TryParse(part1, DblNumberStyles, CultureInfo.InvariantCulture, out part1Dbl))
                    throw new ParseException($"Error parsing value {part1} for field {field} as double.");
                double part2Dbl;
                if (!double.TryParse(part2, DblNumberStyles, CultureInfo.InvariantCulture, out part2Dbl))
                    throw new ParseException($"Error parsing value {part2} for field {field} as double.");
                return NumericRangeQuery.NewDoubleRange(field, part1Dbl, part2Dbl, inclusive, inclusive);
            }
            return base.NewRangeQuery(field, part1, part2, inclusive);
        }

        protected override Query NewTermQuery(Term term)
        {
            if (term.Field == "p")
            {
                double dblParsed;
                if (!double.TryParse(term.Text, DblNumberStyles, CultureInfo.InvariantCulture, out dblParsed))
                    throw new ParseException($"Error parsing value {term.Text} for field {term.Field} as double.");
                return new TermQuery(new Term(term.Field, NumericUtils.DoubleToPrefixCoded(dblParsed)));
            }
            return base.NewTermQuery(term);
        }
    }
}

I improved my code to also allow queries like greater than and lower than when an asterisk is passed. E.g. p:[* TO 5]

...
    double? part1Dbl = null;
    double tmpDbl;
    if (part1 != "*")
    {
        if (!double.TryParse(part1, DblNumberStyles, CultureInfo.InvariantCulture, out tmpDbl))
            throw new ParseException($"Error parsing value {part1} for field {field} as double.");
        part1Dbl = tmpDbl;
    }
    double? part2Dbl = null;
    if (part2 != "*")
    {
        if (!double.TryParse(part2, DblNumberStyles, CultureInfo.InvariantCulture, out tmpDbl))
            throw new ParseException($"Error parsing value {part2} for field {field} as double.");
        part2Dbl = tmpDbl;
    }
    return NumericRangeQuery.NewDoubleRange(field, part1Dbl, part2Dbl, inclusive, inclusive);
...


来源:https://stackoverflow.com/questions/5026185/how-do-i-make-the-queryparser-in-lucene-handle-numeric-ranges

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!