Find list of terms indexed by Lucene

若如初见. 提交于 2019-12-18 12:00:32

问题


Is it possible to extract the list of all the terms in a Lucene index as a list of strings? I couldn't find that functionality in the doc. Thanks!


回答1:


Lucene 3:

  • C#: C# Lucene get all the index

  • Java:

    IndexReader indexReader = IndexReader.open(path); 
    TermEnum termEnum = indexReader.terms(); 
    while (termEnum.next()) { 
        Term term = termEnum.term(); 
        System.out.println(term.text()); 
    }
    termEnum.close(); 
    indexReader.close(); 
    
  • Java (all terms for a specific field): How can I get the list of unique terms from a specific field in Lucene?

  • Python: Finding a single fields terms with Lucene (PyLucene)




回答2:


In Lucene 4 (and 5):

 Terms terms = SlowCompositeReaderWrapper.wrap(directoryReader).terms("field"); 

Edit:

This seems to be the 'correct' way now (Lucene 6 and up):

LuceneDictionary ld = new LuceneDictionary( indexReader, "field" );
BytesRefIterator iterator = ld.getWordsIterator();
BytesRef byteRef = null;
while ( ( byteRef = iterator.next() ) != null )
{
    String term = byteRef.utf8ToString();
}


来源:https://stackoverflow.com/questions/11148036/find-list-of-terms-indexed-by-lucene

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!