How can I use Lucene's PriorityQueue when I don't know the max size at create time?

▼魔方 西西 提交于 2019-12-04 11:48:53

PriorityQueue is not SortedList or SortedDictionary. It is a kind of sorting implementation where it returns the top M results(your PriorityQueue's size) of N elements. You can add with InsertWithOverflow as many items as you want, but it will only hold only the top M elements.

Suppose your search resulted in 1000000 hits. Would you return all of the results to user? A better way would be to return the top 10 elements to the user(using PriorityQueue(10)) and if the user requests for the next 10 result, you can make a new search with PriorityQueue(20) and return the next 10 elements and so on. This is the trick most search engines like google uses.

Everytime Commit gets called, I can add the result to an internal PriorityQueue.

I can not undestand the relationship between Commit and search, Therefore I will append a sample usage of PriorityQueue:

public class CustomQueue : Lucene.Net.Util.PriorityQueue<Document>
{
    public CustomQueue(int maxSize): base()
    {
        Initialize(maxSize);
    }

    public override bool LessThan(Document a, Document b)
    {
        //a.GetField("field1")
        //b.GetField("field2");
        return  //compare a & b
    }
}

public class MyCollector : Lucene.Net.Search.Collector
{
    CustomQueue _queue = null;
    IndexReader _currentReader;

    public MyCollector(int maxSize)
    {
        _queue = new CustomQueue(maxSize);
    }

    public override bool AcceptsDocsOutOfOrder()
    {
        return true;
    }

    public override void Collect(int doc)
    {
        _queue.InsertWithOverflow(_currentReader.Document(doc));
    }

    public override void SetNextReader(IndexReader reader, int docBase)
    {
        _currentReader = reader;
    }

    public override void SetScorer(Scorer scorer)
    {
    }
}

searcher.Search(query,new MyCollector(10)) //First page.
searcher.Search(query,new MyCollector(20)) //2nd page.
searcher.Search(query,new MyCollector(30)) //3rd page.

EDIT for @nokturnal

public class MyPriorityQueue<TObj, TComp> : Lucene.Net.Util.PriorityQueue<TObj>
                                where TComp : IComparable<TComp>
{
    Func<TObj, TComp> _KeySelector;

    public MyPriorityQueue(int size, Func<TObj, TComp> keySelector) : base()
    {
        _KeySelector = keySelector;
        Initialize(size);
    }

    public override bool LessThan(TObj a, TObj b)
    {
        return _KeySelector(a).CompareTo(_KeySelector(b)) < 0;
    }

    public IEnumerable<TObj> Items
    {
        get
        {
            int size = Size();
            for (int i = 0; i < size; i++)
                yield return Pop();
        }
    }
}

var pq = new MyPriorityQueue<Document, string>(3, doc => doc.GetField("SomeField").StringValue);
foreach (var item in pq.Items)
{
}

The reason Lucene's Priority Queue is size limited is because it uses a fixed size implementation that is very fast.

Think about what is the reasonable maximum number of results to get back at a time and use that number, the "waste" for when the results are few is not that bad for the benefit it gains.

On the other hand, if you have such a huge number of results that you cannot hold them, then how are you going to be serving/displaying them? Keep in mind that this is for "top" hits so as you iterate through the results you will be hitting less and less relevant ones anyway.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!