Assume that I have a field called price for the documents in Solr and I have that field faceted. I want to get the facets as ranges of values (eg: 0-100, 100-500, 5
I have worked out how to calculate sensible dynamic facets for product price ranges. The solution involves some pre-processing of documents and some post-processing of the query results, but it requires only one query to Solr, and should even work on old version of Solr like 1.4.
First, before submitting the document, round up the the price to the nearest "nice round facet boundary" and store it in a "rounded_price" field. Users like their facets to look like "250-500" not "247-483", and rounding also means you get back hundreds of price facets not millions of them. With some effort the following code can be generalised to round nicely at any price scale:
public static decimal RoundPrice(decimal price)
{
if (price < 25)
return Math.Ceiling(price);
else if (price < 100)
return Math.Ceiling(price / 5) * 5;
else if (price < 250)
return Math.Ceiling(price / 10) * 10;
else if (price < 1000)
return Math.Ceiling(price / 25) * 25;
else if (price < 2500)
return Math.Ceiling(price / 100) * 100;
else if (price < 10000)
return Math.Ceiling(price / 250) * 250;
else if (price < 25000)
return Math.Ceiling(price / 1000) * 1000;
else if (price < 100000)
return Math.Ceiling(price / 2500) * 2500;
else
return Math.Ceiling(price / 5000) * 5000;
}
Permissible prices go 1,2,3,...,24,25,30,35,...,95,100,110,...,240,250,275,300,325,...,975,1000 and so forth.
Second, when submitting the query, request all facets on rounded prices sorted by price: facet.field=rounded_price. Thanks to the rounding, you'll get at most a few hundred facets back.
Third, after you have the results, the user wants see only 3 to 7 facets, not hundreds of facets. So, combine adjacent facets into a few large facets (called "segments") trying to get a roughly equal number of documents in each segment. The following rather more complicated code does this, returning tuples of (start, end, count) suitable for performing range queries. The counts returned will be correct provided prices were been rounded up to the nearest boundary:
public static List> CombinePriceFacets(int nSegments, ICollection> prices)
{
var ranges = new List>();
int productCount = prices.Sum(p => p.Value);
int productsRemaining = productCount;
if (nSegments < 2)
return ranges;
int segmentSize = productCount / nSegments;
string start = "*";
string end = "0";
int count = 0;
int totalCount = 0;
int segmentIdx = 1;
foreach (KeyValuePair price in prices)
{
end = price.Key;
count += price.Value;
totalCount += price.Value;
productsRemaining -= price.Value;
if (totalCount >= segmentSize * segmentIdx)
{
ranges.Add(new Tuple(start, end, count));
start = end;
count = 0;
segmentIdx += 1;
}
if (segmentIdx == nSegments)
{
ranges.Add(new Tuple(start, "*", count + productsRemaining));
break;
}
}
return ranges;
}
Fourth, suppose ("250","500",38) was one of the resulting segments. If the user selects "$250 to $500" as a filter, simply do a filter query fq=price:[250 TO 500]