发表新帖

发表新帖

Lucene fieldNorm discrepancy between Similarity calculation and query-time value

后端未结

关注

 1  1727

I\'m trying to understand how fieldNorm is calculated (at index time) and then used (and apparentlly re-calculated) at query time.

In all the examples I

相关标签:

1条回答

孤城傲影

2020-12-16 07:57

The documentation of encodeNormValue describes the encoding step (which is where the precision is lost), and particularly the final representation of the value:

The encoding uses a three-bit mantissa, a five-bit exponent, and the zero-exponent point at 15, thus representing values from around 7x10^9 to 2x10^-9 with about one significant decimal digit of accuracy. Zero is also represented. Negative numbers are rounded up to zero. Values too large to represent are rounded down to the largest representable value. Positive values too small to represent are rounded up to the smallest positive representable value.

The most relevant piece to understand that that the mantissa is only 3 bits, which means precision is around one significant decimal digit.

An important note on the rationale comes a few sentences after where your quote ended, where the Lucene docs say:

The rationale supporting such lossy compression of norm values is that given the difficulty (and inaccuracy) of users to express their true information need by a query, only big differences matter.

0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题