How to get DocValue by document ID in Lucene 7+?

送分小仙女□ 提交于 2020-01-05 08:01:17

问题


I'm adding a DocValue to a document with

doc.add(new BinaryDocValuesField("foo",new BytesRef("bar")));

To retrieve that value for a specific document with ID docId, I call

DocValues.getBinary(reader,"foo").get(docId).utf8ToString();

The get function in BinaryDocValues is supported up to Lucene 6.6, but for Lucene 7.0 and up it does not seem to be available anymore.

So, how do I get the DocValue by document ID in Lucene 7+ (without having to iterate over BinaryDocValues / DocIdSetIterator, and without having to re-get BinaryDocValues and use advanceExact every time) ?


回答1:


Theory

Doc values are Lucene's column-stride field value storage. Doc values were intended to be quite fast for random access at query time for faceting and sorting purposes. The following issue LUCENE-7407 switches access pattern from random-access to an iterator. Because an iterator API is a much more restrictive access pattern than an arbitrary random access API, this change gives Lucene more freedom and power to use aggressive compression and other optimizations:

  • reduction of disc space usage in case of sparse data
  • better compression ratio and speed of decoding of doc values, even in the non-sparse case
  • remove special column of missing values(getDocsWithField) and thread local codec readers

You can read about this change in the following blogs:

  • Doc values as iterators
  • Sparse versus dense document values with Apache Lucene

Practice

In practice this change causes performance degradation in some cases, for example SOLR-9599. In major case(faceting and sorting) an iterative API is OK with proper usage and, even more, allows to perform some optimizations. In fact there are a lot of cases where this API is not a good solution. All these cases were discarded as an incorrect usage(the same problem we had in java word with sun.misc.Unsafe).

In fact, org.apache.lucene.index.DocValuesIterator#advanceExact is quite fast and has similar performance and complexity in case of some implementations.



来源:https://stackoverflow.com/questions/48474506/how-to-get-docvalue-by-document-id-in-lucene-7

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!