I\'m looking for a tool to get a decent estimate of how large a MongoDB index will be based on a few signals like:
I just spoke with some of the 10gen engineers and there isn't a tool but you can do a back of the envelope calculation that is based on this formula:
2 * [ n * ( 18 bytes overhead + avg size of indexed field + 5 or so bytes of conversion fudge factor ) ]
Where n is the number of documents you have.
The overhead and conversion padding are mongo specific but the 2x comes from the b-tree data structure being roughly half full (but having allocated 100% of the space a full tree would require) in the worst case.
I'd explain more but I'm learning about it myself at the moment. This presentation will have more details: http://www.10gen.com/presentations/mongosp-2011/mongodb-internals
You can check the sizes of the indexes on a collection by using command:
db.collection.stats()
More details here: http://docs.mongodb.org/manual/reference/method/db.collection.stats/#db.collection.stats
Another way to calculate is to ingest ~1000 or so documents into every collection, in other words, build a small scale model of what you're going to end up with in production, create indexes or what have you and calculate the final numbers based on db.collection.stats()
average.
Does this make sense? :)