I read some document about Lucene; also I read the document in this link (http://lucene.sourceforge.net/talks/pisa).
I don\'t really understand how Lucene indexes do
It seems your question more about index merging than about indexing itself.
Indexing process is quite simple if you ignore low-level details. Lucene form what is called "inverted index" from documents. So if document with text "To be or not to be" and id=1 comes in, inverted index would look like:
[to] → 1
[be] → 1
[or] → 1
[not] → 1
This is basically it – the index from the word to the list of documents containing given word. Each line of this index (word) is called posting list. This index is persisted on long-term storage then.
In reality of course things are more complicated:
There are many more complications which are not so important for basic understanding.
It's important to understand though, that Lucene index is append only. In some point in time application decide to commit (publish) all the changes in the index. Lucene finish all service operations with index and close it, so it's available for searching. After commit index basically immutable. This index (or index part) is called segment. When Lucene execute search for a query it search in all available segments.
So the question arise – how can we change already indexed document?
New documents or new versions of already indexed documents are indexed in new segments and old versions invalidated in previous segments using so called kill list. Kill list is the only part of committed index which can change. As you might guess, index efficiency drops with time, because old indexes might contain mostly removed documents.
This is where merging comes in. Merging – is the process of combining several indexes to make more efficient index overall. What is basically happens during merge is live documents copied to the new segment and old segments removed entirely.
Using this simple process Lucene is able to maintain index in good shape in terms of a search performance.
Hope it'll helps.