I have a Lucene index where every document has several fields which contain numeric values. Now I would like to sort the search result on a weighted sum of this field. For e
You could try implementing a custom ScoreDocComparator. For example:
public class ScaledScoreDocComparator implements ScoreDocComparator {
private int[][] values;
private float[] scalars;
public ScaledScoreDocComparator(IndexReader reader, String[] fields, float[] scalars) throws IOException {
this.scalars = scalars;
this.values = new int[fields.length][];
for (int i = 0; i < values.length; i++) {
this.values[i] = FieldCache.DEFAULT.getInts(reader, fields[i]);
}
}
protected float score(ScoreDoc scoreDoc) {
int doc = scoreDoc.doc;
float score = 0;
for (int i = 0; i < values.length; i++) {
int value = values[i][doc];
float scalar = scalars[i];
score += (value * scalar);
}
return score;
}
@Override
public int compare(ScoreDoc i, ScoreDoc j) {
float iScore = score(i);
float jScore = score(j);
return Float.compare(iScore, jScore);
}
@Override
public int sortType() {
return SortField.CUSTOM;
}
@Override
public Comparable<?> sortValue(ScoreDoc i) {
float score = score(i);
return Float.valueOf(score);
}
}
Here is an example of ScaledScoreDocComparator
in action. I believe it works in my test, but I encourage you to prove it against your data.
final String[] fields = new String[]{ "field1", "field2", "field3" };
final float[] scalars = new float[]{ 0.5f, 1.4f, 1.8f };
Sort sort = new Sort(
new SortField(
"",
new SortComparatorSource() {
public ScoreDocComparator newComparator(IndexReader reader, String fieldName) throws IOException {
return new ScaledScoreDocComparator(reader, fields, scalars);
}
}
)
);
IndexSearcher indexSearcher = ...;
Query query = ...;
Filter filter = ...; // can be null
int nDocs = 100;
TopFieldDocs topFieldDocs = indexSearcher.search(query, filter, nDocs, sort);
ScoreDoc[] scoreDocs = topFieldDocs.scoreDocs;
It appears that the Lucene developers are deprecating the ScoreDocComparator
interface (it's currently deprecated in the Subversion repository). Here is an example of the ScaledScoreDocComparator
modified to adhere to ScoreDocComparator
's successor, FieldComparator
:
public class ScaledComparator extends FieldComparator {
private String[] fields;
private float[] scalars;
private int[][] slotValues;
private int[][] currentReaderValues;
private int bottomSlot;
public ScaledComparator(int numHits, String[] fields, float[] scalars) {
this.fields = fields;
this.scalars = scalars;
this.slotValues = new int[this.fields.length][];
for (int fieldIndex = 0; fieldIndex < this.fields.length; fieldIndex++) {
this.slotValues[fieldIndex] = new int[numHits];
}
this.currentReaderValues = new int[this.fields.length][];
}
protected float score(int[][] values, int secondaryIndex) {
float score = 0;
for (int fieldIndex = 0; fieldIndex < fields.length; fieldIndex++) {
int value = values[fieldIndex][secondaryIndex];
float scalar = scalars[fieldIndex];
score += (value * scalar);
}
return score;
}
protected float scoreSlot(int slot) {
return score(slotValues, slot);
}
protected float scoreDoc(int doc) {
return score(currentReaderValues, doc);
}
@Override
public int compare(int slot1, int slot2) {
float score1 = scoreSlot(slot1);
float score2 = scoreSlot(slot2);
return Float.compare(score1, score2);
}
@Override
public int compareBottom(int doc) throws IOException {
float bottomScore = scoreSlot(bottomSlot);
float docScore = scoreDoc(doc);
return Float.compare(bottomScore, docScore);
}
@Override
public void copy(int slot, int doc) throws IOException {
for (int fieldIndex = 0; fieldIndex < fields.length; fieldIndex++) {
slotValues[fieldIndex][slot] = currentReaderValues[fieldIndex][doc];
}
}
@Override
public void setBottom(int slot) {
bottomSlot = slot;
}
@Override
public void setNextReader(IndexReader reader, int docBase, int numSlotsFull) throws IOException {
for (int fieldIndex = 0; fieldIndex < fields.length; fieldIndex++) {
String field = fields[fieldIndex];
currentReaderValues[fieldIndex] = FieldCache.DEFAULT.getInts(reader, field);
}
}
@Override
public int sortType() {
return SortField.CUSTOM;
}
@Override
public Comparable<?> value(int slot) {
float score = scoreSlot(slot);
return Float.valueOf(score);
}
}
Using this new class is very similar to the original, except that the definition of the sort
object is a bit different:
final String[] fields = new String[]{ "field1", "field2", "field3" };
final float[] scalars = new float[]{ 0.5f, 1.4f, 1.8f };
Sort sort = new Sort(
new SortField(
"",
new FieldComparatorSource() {
public FieldComparator newComparator(String fieldname, int numHits, int sortPos, boolean reversed) throws IOException {
return new ScaledComparator(numHits, fields, scalars);
}
}
)
);
I'm thinking one way to do this would be to accept these as parameters to your sorting function:
number of fields, array of documents, list of weight factors(based on the number of fields)
Calculate the weighing function for each document, storing the result in a separate array in the same order as the document array. Then, perform any sort you wish (quick sort would probably be best), making sure you are sorting not just the f(d) array, but the document array as well. Return the sorted documents array and you're done.
Create a wrapper which holds the rating and is comparable. Something like:
public void sort(Datum[] data) {
Rating[] ratings = new Rating[data.length];
for(int i=0;i<data.length;i++)
rating[i] = new Rating(data[i]);
Arrays.sort(rating);
for(int i=0;i<data.length;i++)
data[i] = rating[i].datum;
}
class Rating implements Comparable<Datum> {
final double rating;
final Datum datum;
public Rating(Datum datum) {
this.datum = datum;
rating = datum.field1 * 0.5 + datum.field2 * 1.4 + datum.field3 * 1.8
}
public int compareTo(Datum d) {
return Double.compare(rating, d.rating);
}
}
Implement your own similarity class and override idf(Term, Searcher) method. In this method, you can return the score as follows. if (term.field.equals("field1") {
if (term.field.equals("field1") {
score = 0.5 * Integer.parseInt(term.text());
} else if (term.field.equals("field2") {
score = 1.4 * Integer.parseInt(term.text());
} // and so on
return score;
When you execute the query, make sure it is on all the fields. That is query should look like
field1:term field2:term field3:term
The final score will also add some weights based on the query normalization. But, that will not affect the relative ranking of the documents as per the equation given by you.