Defining a shard-key automatically for a dynamic collection and suggestion on the design

时光毁灭记忆、已成空白 提交于 2019-12-10 10:54:33

问题


I want to implement Sharding for my MongoDb and need some of your suggestions.

Insight

  1. We have lots of cron-job which collects various information about a machine & writes them to it's own collection.
  2. Collections are created dynamically.
  3. Each collection has millions of data.
  4. Structure1 for each collection is Name, Category, Subcategory, NodeId, Process-Start-Time, Process-End-Time, Value.
  5. Structure2 for each collection is Name, Category, Subcategory, Subtype, Date, Value.
  6. Structure3 for each collection is Name, Category, Subcategory, NodeId, Process-Start-Time, Process-End-Time, Value, Flag1, Flag2, Flag3.

After a research we found we will use sharding and make it useful with multiple servers which guarantees two things:

  1. Need not worry on running out of space.
  2. Balanced performance across servers

Question 1: My problem is to find a correct shard-key to partition the data. I don't see a unique-key in the collection other than the default ObjectId. After further reading I have found that it is possible to use a composite key, does it make sense to have a composite key or custom ObjectId as a key where the value might look like ObjectId: _. This is very key with respect to performance of returning the results of a query & moving the chunks.

Question 2: Since we have large collections, it will become difficult to set the shard each time in Mongo console when a collection is created dynamically. Is there any way to make it automatic in mongo so that whenever a collection is created for a shard-database, it will define the shard-key for that collection?

Question 3: Is it really necessary to pass shard-key to the query expression? I don't think we have used ObjectId in any of our query-expression, I doubt I can come with a unique ID due to fact that the data is not structured like a traditional DB. If yes, how is it going to help for a query like this:

Example:

{ category: "Energy", subcategory: "Watt", Process-Start-Time: {$gte: 132234234}}

Thanks in advance for stepping in and helping me fix this problem.


回答1:


The easiest way to do this might be to shard the database, but leave the collections unsharded. Benefits:

  • Collections will be distributed across the shards (but each collection will only live on one shard). EDIT: I was wrong about this, this isn't implemented yet. See the related Jira ticket to track. For now, you can use tags to distribute collections, but not automatically.
  • No need to call shardCollection on each new collection

The downside is that all traffic for a collection will go to its shard, which might be impractical for what you're trying to do.

As far as your questions:

Question 1: Shard key does not have to be unique. What are you generally querying for? You might be better of with something like {category:1} or {category:1,subcategory:1}.

Question 2: No built-in way to do it automatically, the best way to get that behavior is probably to set up a cron job.

Question 3: No. Queries containing the shard key can be sent to specific shards and queries without the shard key must be sent to all shards, see http://www.mongodb.org/display/DOCS/Sharding+Introduction#ShardingIntroduction-OperationTypes.



来源:https://stackoverflow.com/questions/10329484/defining-a-shard-key-automatically-for-a-dynamic-collection-and-suggestion-on-th

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!