Lucene 4.0 IndexWriter updateDocument for Numeric Term

删除回忆录丶 提交于 2019-12-24 02:10:01

问题


I just wanted to know how it is possible to to update (delete/insert) a document based on a numeric field. So far I did this:

LuceneManager.updateDocument(writer, new Term("id",  NumericUtils.intToPrefixCoded(sentenceId)), newDoc);

But now with Lucene 4.0 the NumericUtils class has changed to this which I don't really understand. Any help?


回答1:


I would recommend, if possible, it would be better to store an ID as a keyword string, rather than a number. If it is simply a unique identifier, indexing as a keyword makes much more sense. This removes any need to mess with numeric formatting.

If it is actually being used as a number, then you might need to perform the update manually. That is, search for and fetch the document you wish to update, delete the old document with tryDeleteDocument, and then add the updated version with addDocument. This is basically what updateDocument does anyway, to my knowledge.

The first option would certainly be the better way, though. A non-numeric field to use as an update ID would make life easier.




回答2:


With Lucene 4, you can now create IntField, LongField, FloatField or DoubleField like this:

document.add(new IntField("id", 6, Field.Store.NO));

To write the document once you modified it, it's still:

indexWriter.updateDocument(new Term("pk", "<pk value>"), document);

EDIT: And here is a way to make a query including this numeric field:

// Query <=> id <= 7
Query query = NumericRangeQuery.newIntRange("id", Integer.MIN_VALUE, 7, true, true);
TopDocs topDocs = indexSearcher.search(query, 10);



回答3:


With Lucene 5.x, this could be solved by code below:

    int id = 1;
    BytesRefBuilder brb = new BytesRefBuilder();
    NumericUtils.intToPrefixCodedBytes(id, 0, brb);
    Term term = new Term("id", brb.get());
    indexWriter.updateDocument(term, doc); // or indexWriter.deleteDocument(term);



回答4:


You can use it this way:

First you must set the FieldType's numeric type:

FieldType TYPE_ID = new FieldType();
...
TYPE_ID.setNumericType(NumericType.INT);
TYPE_ID.freeze();

and then:

int idTerm = 10;
BytesRef bytes = new BytesRef(NumericUtils.BUF_SIZE_INT);
NumericUtils.intToPrefixCoded(id, 0, bytes);
Term idTerm = new Term("id", bytes);

and now you'll be able to use idTerm to update the doc.




回答5:


According to the documentation of Lucene 4.0.0, the ID field must to be used with StringField class:

"A field that is indexed but not tokenized: the entire String value is indexed as a single token. For example this might be used for a 'country' field or an 'id' field, or any field that you intend to use for sorting or access through the field cache."

I had the same problem as you and I solved it by making this change. After that, my update and delete worked perfectly.



来源:https://stackoverflow.com/questions/13958431/lucene-4-0-indexwriter-updatedocument-for-numeric-term

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!