I am looking into indexing engines, specifically Apache Lucene Solr. We are willing to use it for our searches, yet one of the problems solved by our frameworks search is ro
I am afraid there's no easy solution here. You will have to sacrifice something to get ACLs working together with the search.
If your corpus size is small (I'd say up to 10K documents), you could create a cached bit set of forbidden (or allowed, whichever less verbose) documents and send relevant filter query (+*:* -DocumentNr:1 ... -DocumentNr:X)
. Needless to say, this doesn't scale. Sending large queries will make the search a bit slower, but this is manageable (up to a point of course). Query parsing is cheap.
If you can somehow group these documents and apply ACLs on document groups, this would allow cutting on query length and the above approach would fit perfectly. This is pretty much what we are using - our solution implements taxonomy and has taxonomy permissions done via fq
query.
If you don't need to show the overall result set count, you can run your query and filter the result set on the client side. Again, not perfect.
You can also denormalize your data structures and store both tables flattened in a single document like this:
DocumentNr: 1
Name: Foo
Allowed_users: u1, u2, u3 (or Forbidden_users: ...)
The rest is as easy as sending user id with your query.
Above is only viable if the ACLs are rarely changing and you can afford reindexing the entire corpus when they do.
You could write a custom query filter which would have cached BitSet
s of allowed or forbidden documents by user(group?) retrieved from the database. This would require not only providing DB access for Solr webapp but also extending/repackaging the .war which comes with Solr. While this is relatively easy, the harder part would be cache invalidation: main app should somehow signal Solr app when ACL data gets changed.
Options 1 and 2 are probably more reasonable if you can put Solr and your app onto the same JVM and use javabin driver.
It's hard to advice more without knowing the specifics of the corpus/ACLs.
I am agree with mindas, what he has suggested (sol-4), i have implemented my solution the same way,but the difference is i have few different type of ACLs. At usergroup level,user level and even document level too (private access).
The solution is working fine. But the main concern in my case is that ACLs gets changed frequently and that needs to be updated in the index,mean while search performance should not get affected too.
I am trying to manage this with load balancing and adding few more nodes into the cluster.
mindas,unicron can you please put your thoughts on this?