Say instead of documents I have small trees that I need to store in a Lucene index. How do I go about doing that?
An example node in the tree:
class
Another approach is to store a representation of the current node's location in the tree. For example, the 17th leaf of the 3rd 2nd-level node of the 1st 1st-level node of the 14th tree would be represented as 014.001.003.017.
Assuming 'treepath' is the field name of the tree location, you would query on 'treepath:014*' to find all nodes and leaves in the 14th tree. Similarly, to find all of the children of the 14th tree you would query on 'treepath:014.*'.
The major problem with this approach is that moving branches around requires re-ordering every branch after the branch that was moved. If your trees are relatively static, that may only be a minor problem in practice.
(I've seen this approach called either a 'path enumeration' or a 'Dewey Decimal' representation.)
There is a project SIREn http://rdelbru.github.io/SIREn which deals with 'in-depth' trees, addressing. Internally uses Dewey numbering (http://www.ipl.org/div/farq/deweyFARQ.html) ....
I suggest Neo4j. Tree is, after all, just a special, restrained graph.
Check out this great discussion on whether you should store a tree in Neo4j:
http://www.mail-archive.com/user@lists.neo4j.org/msg03256.html
This requirement and the solution is captured here: Proposal for nested docs
This design was subsequently implemented both by core Lucene and Elastic Search. The BlockJoinQuery is the core Lucene implementation and Elastic Search look to have an implementation as outlined here: Elastic search nested docs