Stronger boosting by date in Solr

前端 未结 3 1545
萌比男神i
萌比男神i 2020-12-13 07:44

Boosting by date field in solr is defined as:

{!boost b=recip(ms(NOW,datefield),3.16e-11,1,1)}

I looked everywhere (examples: Solr Dismax C

3条回答
  •  伪装坚强ぢ
    2020-12-13 07:48

    recip(x, m, a, b) implements f(x) = a/(xm+b) with :

    • x : the document age in ms, defined as ms(NOW,).

    • m : a constant that defines a time scale which is used to apply boost. It should be relative to what you consider an old document age (a reference_time) in milliseconds. For example, choosing a reference_time of 1 year (3.16e10ms) implies to use its inverse : 3.16e-11 (1/3.16e10 rounded).

    • a and b are constants (defined arbitrarily).

    • xm = 1 when the document is 1 reference_time old (multiplier = a/(1+b)).
      xm ≈ 0 when the document is new, resulting in a value close to a/b.

    • Using the same value for a and b ensures the multiplier doesn't exceed 1 with recent documents.

    • With a = b = 1, a 1 reference_time old document has a multiplier of about 1/2, a 2 reference_time old document has a multiplier of about 1/3, and so on.

    How to make a date boosting stronger ?

    • Increase m : choose a lower reference_time for example 6 months, that gives us m = 6.33e-11. Comparing to a 1 year reference, the multiplier decreases 2x faster as the document age increases.

    • Decreasing a and b expands the response curve of the function. This can be very agressive, see this example (page 8).

    • Apply a boost to the boost function itself with the bf (Boost Functions) parameter (this is a dismax parameter so it requires using DisMax or eDisMax query parser), eg. :

      bf=recip(ms(NOW,datefield),3.16e-11,1,1)^2.0
      

    It is important to note a few things :

    • bf is an additive boost and acts as a bonus added to the score of newer documents.

    • {!boost b} is a multiplicative boost and acts more as a penalty applied to the score of older document.

    • A bf score (the "bonus" added to the global score) is calculated independently of the relevancy score (the global score), meaning that a resultset with higher scores may not be impacted as much as a resultset with lower scores. In contrast, multiplicative boosts affect scores the same way regardless of the resultset relevancy, that's why it is usually preferred.

    • Do not use recip() for dates more than one reference_time in the future or it will yield negative values.

    See also this very insightful post by Nolan Lawson on Comparing boost methods in Solr.

提交回复
热议问题