In a Lucene / Lucene.net search, how do I count the number of hits per document?

狂风中的少年 提交于 2019-11-27 18:30:34

问题


When searching a bunch of documents, I can easily find the number of documents which match my search criteria:

Hits hits = Searcher.Search(query);
int DocumentCount = hits.Length();

How do I determine the total number of hits within the documents? For example, let's say I search for "congress" and I get 2 documents back. How can I get the number of times "congress" occurs in each document? For example let's say "congress" occurs 2 times in document #1 and 3 times in document #2. The result I'm looking for is 5.


回答1:


This is Lucene Java, but should work for Lucene.NET:

List docIds = // doc ids for documents that matched the query, 
              // sorted in ascending order 

int totalFreq = 0;
TermDocs termDocs = reader.termDocs();
termDocs.seek(new Term("my_field", "congress"));
for (int id : docIds) {
    termDocs.skipTo(id);
    totalFreq += termDocs.freq();
}



回答2:


This is Lucene Java also. If your query/search criteria can be written as a SpanQuery, then you can do something like this:

IndexReader indexReader = // define your index reader here
SpanQuery spanQuery = // define your span query here
Spans spans = spanQuery.getSpans(indexReader);
int occurrenceCount = 0;
while (spans.next()) {
    occurrenceCount++;
}
// now occurrenceCount contains the total number of occurrences of the word/phrase/etc across all documents in the index


来源:https://stackoverflow.com/questions/2249364/in-a-lucene-lucene-net-search-how-do-i-count-the-number-of-hits-per-document

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!