Mongodb Sharding tutorial



For starters, MongoDB is a document-oriented NoSQL database used for high volume data storage. In the traditional relational databases, we use tables and rows. In contrast, MongoDB makes use of collections and documents. Documents consist of key-value pairs which are the basic unit of data in MongoDB.

Without any further delay, we jump into what is Sharding, its requirement, sharding cluster architecture in MongoDB, and a practical example with Docker.

 

Sharding

Why Sharding?

For addressing system growth, we have 2 methods :

  1. Vertical Scaling.
  2. Horizontal Scaling.

Vertical Scaling

Horizontal Scaling

MongoDB supports horizontal scaling through sharding.

Sharding cluster

  1. Shard: Each shard contains a subset of the sharded data. Each shard can be deployed as a replica set to provide redundancy and high availability. Together, the cluster’s shards hold the entire data set for the cluster.
  2. Mongos: The mongos acts as a query router, providing an interface between client applications and the sharded cluster.
  3. Config Servers: Config servers store metadata and configuration settings for the cluster. They are also deployed as a replica set.

mongodb sharding ubuntu
Sharded cluster

Shard Keys

You choose the shard key when sharding a collection. The choice of shard key cannot be changed after sharding. A sharded collection can have only one shard key.

To shard a non-empty collection, the collection must have an index that starts with the shard key. For empty collections, MongoDB creates the index if the collection does not already have an appropriate index for the specified shard key. See Shard Key Indexes.

Note: The choice of shard key affects the performance, efficiency, and scalability of a sharded cluster. A cluster with the best possible hardware and infrastructure can be bottlenecked by the choice of the shard key.


Chunks

Balancer and Even Chunk Distribution

Advantages of Sharding

  1. Storage Capacity: Sharding distributes data across the shards in the cluster, allowing each shard to contain a subset of the total cluster data. As the data set grows, additional shards increase the storage capacity of the cluster.
  2. High Availability: A sharded cluster can continue to perform partial read/write operations even if one or more shards are unavailable. While the subset of data on the unavailable shards cannot be accessed during the downtime, reads or writes directed at the available shards can still succeed.

In production environments, individual shards should be deployed as replica sets, providing increased redundancy and availability.

Sharded and Non-Sharded Collections


mongodb sharded vs no sharded collections
Sharded and Unsharded collections

Connecting to a Sharded Cluster


mongodb sharding docker-compose

You can connect to a mongos the same way you connect to a mongod, such as via the mongo shell or a MongoDB driver.

You can find the docker-compose.yaml and same set up without docker here.

Github link: MongoDB sharding

 

References and additional resources:

  1. https://www.mongodb.com/lp/white-paper/usage/ops-best-practices
  2. https://www.mongodb.com/presentations/webinar-sharding-methods-mongodb?jmp=docs

 

Post a Comment