Lucene RangeQuery doesn't filter appropriately

二次信任 提交于 2019-11-30 20:28:15

UPDATE: Like @basZero said in his comment, starting with Lucene 2.9, you can add numeric fields to your documents. Just remember to use NumericRangeQuery instead of RangeQuery when you search.

Original answer

Lucene treats numbers as words, so their order is alphabetic:

0
1
12
123
2
22

That means that for Lucene, 12 is between 0 and 2. If you want to do a proper numerical range, you need to index the numbers zero-padded, then do a range search of [0000 TO 0002]. (The amount of padding you need depends on the expected range of values).

If you have negative numbers, just add another zero for non-negative numbers. (EDIT: WRONG WRONG WRONG. See update)

If your numbers include a fraction part, leave it as is, and zero-pad the integer part only.

Example:

-00002.12
-00001

000000
000001
000003.1415
000022

UPDATE: Negative numbers are a bit tricky, since -1 comes before -2 alphabetically. This article gives a complete explanation about dealing with negative numbers and numbers in general in Lucene. Basically, you have to "encode" numbers into something that makes the order of the items make sense.

I created a PHP function that convert numerics to lucene/solr range searchables.

0.5 is converted to 10000000000.5
-0.5 is converted to 09999999999.5

function luceneNumeric($numeric)
{
    $negative = $numeric < 0;
    $numeric = $negative ? 10000000000 + $numeric : $numeric;

    $parts = explode('.', str_replace(',', '.', $numeric));

    $lucene = $negative ? 0 : 1;
    $lucene .= str_pad($parts[0], 10, '0', STR_PAD_LEFT);
    $lucene .= isset($parts[1]) ? '.' . $parts[1] : '';

    return $lucene;
}

It seems to work, hope this helps someone!

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!