Get search word Hits ( number of occurences) per document in Lucene

ⅰ亾dé卋堺 提交于 2020-01-11 11:48:25

问题


Can any one suggest me the best way to get Hits( no of occurrences ) of a word per document in Lucene?..


回答1:


Lucene uses a field-based, rather than document-based, index. In order to get term counts per document:

  1. Iterate over documents using IndexReader.document() and isDeleted().
  2. In document d, iterate over fields using Document.getFields().
  3. For each field f, get terms using getTermFreqVector().
  4. Go over the term vector and sum frequencies per terms.
  5. The sum of term frequencies per field will give you the document's term frequency vector.



回答2:


SpanTermQuery.getSpans will give an enumeration of docs and where the terms appears. The docs are sorted, so you can just count the number of times each doc appears, ignoring the position info.



来源:https://stackoverflow.com/questions/1920726/get-search-word-hits-number-of-occurences-per-document-in-lucene

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!