Lucene RangeQuery doesn't filter appropriately

南笙酒味 提交于 2019-11-30 05:39:13

问题


I'm using RangeQuery to get all the documents which have amount between say 0 to 2. When i execute the query, Lucene gives me documents which have amount greater than 2 also. What am I missing here?

Here is my code:

Term lowerTerm = new Term("amount", minAmount);
Term upperTerm = new Term("amount", maxAmount);

RangeQuery amountQuery = new RangeQuery(lowerTerm, upperTerm, true);

finalQuery.Add(amountQuery, BooleanClause.Occur.MUST);

and here is what goes into my index:

doc.Add(new Field("amount", amount.ToString(), Field.Store.YES, Field.Index.UN_TOKENIZED, Field.TermVector.YES));

回答1:


UPDATE: Like @basZero said in his comment, starting with Lucene 2.9, you can add numeric fields to your documents. Just remember to use NumericRangeQuery instead of RangeQuery when you search.

Original answer

Lucene treats numbers as words, so their order is alphabetic:

0
1
12
123
2
22

That means that for Lucene, 12 is between 0 and 2. If you want to do a proper numerical range, you need to index the numbers zero-padded, then do a range search of [0000 TO 0002]. (The amount of padding you need depends on the expected range of values).

If you have negative numbers, just add another zero for non-negative numbers. (EDIT: WRONG WRONG WRONG. See update)

If your numbers include a fraction part, leave it as is, and zero-pad the integer part only.

Example:

-00002.12
-00001

000000
000001
000003.1415
000022

UPDATE: Negative numbers are a bit tricky, since -1 comes before -2 alphabetically. This article gives a complete explanation about dealing with negative numbers and numbers in general in Lucene. Basically, you have to "encode" numbers into something that makes the order of the items make sense.




回答2:


I created a PHP function that convert numerics to lucene/solr range searchables.

0.5 is converted to 10000000000.5
-0.5 is converted to 09999999999.5

function luceneNumeric($numeric)
{
    $negative = $numeric < 0;
    $numeric = $negative ? 10000000000 + $numeric : $numeric;

    $parts = explode('.', str_replace(',', '.', $numeric));

    $lucene = $negative ? 0 : 1;
    $lucene .= str_pad($parts[0], 10, '0', STR_PAD_LEFT);
    $lucene .= isset($parts[1]) ? '.' . $parts[1] : '';

    return $lucene;
}

It seems to work, hope this helps someone!



来源:https://stackoverflow.com/questions/708075/lucene-rangequery-doesnt-filter-appropriately

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!