In Mongo what is the difference between sharding and replication?

后端 未结 8 1651
渐次进展
渐次进展 2020-12-12 14:43

Replication seems to be a lot simpler than sharding, unless I am missing the benefits of what sharding is actually trying to achieve. Don\'t they both provide horizontal sca

相关标签:
8条回答
  • 2020-12-12 15:15

    Sharding

    Sharding is a technique of splitting up a large collection amongst multiple servers. When we shard, we deploy multiple mongod servers. And in the front, mongos which is a router. The application talks to this router. This router then talks to various servers, the mongods. The application and the mongos are usually co-located on the same server. We can have multiple mongos services running on the same machine. It's also recommended to keep set of multiple mongods (together called replica set), instead of one single mongod on each server. A replica set keeps the data in sync across several different instances so that if one of them goes down, we won't lose any data. Logically, each replica set can be seen as a shard. It's transparent to the application, the way MongoDB chooses to shard is we choose a shard key.

    Assume, for student collection we have stdt_id as the shard key or it could be a compound key. And the mongos server, it's a range based system. So based on the stdt_id that we send as the shard key, it'll send the request to the right mongod instance.

    So, what do we need to really know as a developer?

    • insert must include a shard key, so if it's a multi-parted shard key, we must include the entire shard key
    • we've to understand what the shard key is on collection itself
    • for an update, remove, find - if mongos is not given a shard key - then it's going to have to broadcast the request to all the different shards that cover the collection.
    • for an update - if we don't specify the entire shard key, we have to make it a multi update so that it knows that it needs to broadcast it
    0 讨论(0)
  • 2020-12-12 15:16

    Just to put this somewhere...

    The most basic way to run mongo is as standalone server.

    • You write a config (file or cli options)
    • initiate the server using mongod

    For this picture, I didn't include the "client". Check the next one.

    • A replica set is a set of servers initialized exactly as above with a different config file.
    • To link them, we connect to one of them, and initialize the replica set mode.
    • They will mirror each other (in the most common configuration). This system guarantees high availability of data.

    The initialization of the replica set is represented in the red border box.

    • Sharding is not about replicating data, but about fragmenting data.
    • Each fragment of data is called chunk and goes to a different shard. shard = each replica set.
    • "main" server, running mongos instead of mongod. This is a router for queries from the client.

    Obvious: The trade-off is a more complex architecture. Novelty: configuration server (again, a different config file).

    There is much more to add, but apart from the words the pictures hold much the same.


    Even mongoDB recommends to study your case carefully before going sharding. Vertical scaling (vs) is probably a good idea at least once before horizontal scaling (hs).

    vs is done upgrading hardware (cpu, ram, etc). hs is needs more computers (but could be cheap computers).

    0 讨论(0)
  • 2020-12-12 15:24

    Replication is a mostly traditional master/slave setup, data is synced to backup members and if the primary fails one of them can take its place. It is a reasonably simple tool. It's primarily meant for redundancy, although you can scale reads by adding replica set members. That's a little complicated, but works very well for some apps.

    Sharding sits on top of replication, usually. "Shards" in MongoDB are just replica sets with something called a "router" in front of them. Your application will connect to the router, issue queries, and it will decide which replica set (shard) to forward things on to. It's significantly more complex than a single replica set because you have the router and config servers to deal with (these keep track of what data is stored where).

    If you want to scale Mongo horizontally, you'd shard. 10gen likes to call the router/config server setup auto-sharding. It's possible to do a more ghetto form of sharding where you have the app decide which DB to write to as well.

    0 讨论(0)
  • 2020-12-12 15:27

    MongoDB Atlas is a Database as a service in could. It support three major cloud providers such as Azure , AWS and GCP. In cloud environment , we usually talk about high availability and scalability. In Atlas “clusters”, can be either a replica set or a sharded cluster. These two address high availability and scalability features of our cloud environment.

    In general Cluster is a group of servers used to achieve a specific task. So sharded clusters are used to store data in across multiple machines to meet the demand of data growth. As the size of the data increases, a single machine may not be sufficient to store the data nor provide an acceptable read and write throughput. Sharded clusters supports the horizontal scalability of the underling cloud environment.

    A replica set in MongoDB is a group of mongod processes that maintain the same data set. Replica sets provide redundancy and high availability, and are the basis for all production deployments.In a replica, one node is a primary node that receives all write operations. All other instances, such as secondaries, apply operations from the primary so that they have the same data set. Replica set mainly focus on the availability of data.

    Please check the documentation

    Thank You.

    0 讨论(0)
  • 2020-12-12 15:29

    Whenever you're thinking about sharding or replication, you need to think in the context of writers/update operations. If you don't need to scale writes then replications, as it fairly simpler, is a good choice for you.

    On the other hand, if you workload mostly updates/writes then at some point you'll hit a write bottleneck. If write request comes Mongo blocks other writes request. Those write request blocks until the first request will be done. If you want to scale this writes and want parallelize it then you need to implement sharding.

    0 讨论(0)
  • 2020-12-12 15:32

    In the context of scaling MongoDB:

    • replication creates additional copies of the data and allows for automatic failover to another node. Replication may help with horizontal scaling of reads if you are OK to read data that potentially isn't the latest.

    • sharding allows for horizontal scaling of data writes by partitioning data across multiple servers using a shard key. It's important to choose a good shard key. For example, a poor choice of shard key could lead to "hot spots" of data only being written on a single shard.

    A sharded environment does add more complexity because MongoDB now has to manage distributing data and requests between shards -- additional configuration and routing processes are added to manage those aspects.

    Replication and sharding are typically combined to created a sharded cluster where each shard is supported by a replica set.

    From a client application point of view you also have some control in relation to the replication/sharding interaction, in particular:

    • Read preferences
    • Write concerns
    0 讨论(0)
提交回复
热议问题