问题
I want to implement Sharding for my MongoDb and need some of your suggestions.
Insight
- We have lots of cron-job which collects various information about a machine & writes them to it's own collection.
- Collections are created dynamically.
- Each collection has millions of data.
- Structure1 for each collection is Name, Category, Subcategory, NodeId, Process-Start-Time, Process-End-Time, Value.
- Structure2 for each collection is Name, Category, Subcategory, Subtype, Date, Value.
- Structure3 for each collection is Name, Category, Subcategory, NodeId, Process-Start-Time, Process-End-Time, Value, Flag1, Flag2, Flag3.
After a research we found we will use sharding and make it useful with multiple servers which guarantees two things:
- Need not worry on running out of space.
- Balanced performance across servers
Question 1: My problem is to find a correct shard-key to partition the data. I don't see a unique-key in the collection other than the default ObjectId. After further reading I have found that it is possible to use a composite key, does it make sense to have a composite key or custom ObjectId as a key where the value might look like ObjectId: _. This is very key with respect to performance of returning the results of a query & moving the chunks.
Question 2: Since we have large collections, it will become difficult to set the shard each time in Mongo console when a collection is created dynamically. Is there any way to make it automatic in mongo so that whenever a collection is created for a shard-database, it will define the shard-key for that collection?
Question 3: Is it really necessary to pass shard-key to the query expression? I don't think we have used ObjectId in any of our query-expression, I doubt I can come with a unique ID due to fact that the data is not structured like a traditional DB. If yes, how is it going to help for a query like this:
Example:
{ category: "Energy", subcategory: "Watt", Process-Start-Time: {$gte: 132234234}}
Thanks in advance for stepping in and helping me fix this problem.
回答1:
The easiest way to do this might be to shard the database, but leave the collections unsharded. Benefits:
- Collections will be distributed across the shards (but each collection will only live on one shard). EDIT: I was wrong about this, this isn't implemented yet. See the related Jira ticket to track. For now, you can use tags to distribute collections, but not automatically.
- No need to call shardCollection on each new collection
The downside is that all traffic for a collection will go to its shard, which might be impractical for what you're trying to do.
As far as your questions:
Question 1: Shard key does not have to be unique. What are you generally querying for? You might be better of with something like {category:1}
or {category:1,subcategory:1}
.
Question 2: No built-in way to do it automatically, the best way to get that behavior is probably to set up a cron job.
Question 3: No. Queries containing the shard key can be sent to specific shards and queries without the shard key must be sent to all shards, see http://www.mongodb.org/display/DOCS/Sharding+Introduction#ShardingIntroduction-OperationTypes.
来源:https://stackoverflow.com/questions/10329484/defining-a-shard-key-automatically-for-a-dynamic-collection-and-suggestion-on-th