Hello 👋 and welcome to a new edition of the Cloud & Backend newsletter.
In this post, I explain the sharding architecture of MongoDB.
I already talked about sharding in an earlier post. So you can also refer to that for the basics of sharding.
MongoDB is one of the most widely used databases in the Cloud landscape.
A behind-the-scenes look at how sharding works in MongoDB is a great addition to your knowledge about sharding on an implementation level.
🤖 How MongoDB Sharding Works?
To use sharding in MongoDB, the first thing you need is a cluster.
A cluster is a group of interconnected servers (also known as nodes) that work together to provide a unified resource.
By stacking servers in a cluster, you can scale your system horizontally.
A really good thing, by the way. Much better than scaling vertically.
I did talk about scalability a long while back. So you can check it out if you are interested.
In MongoDB, this cluster is also known as the sharded cluster. And it has 3 important parts:
Shard
Mongos Router
Config Servers
Here’s an illustration that shows how the 3 parts connect and interact with each other.
Let’s look at each component in greater detail:
👉 Shard
In the illustration, the green boxes are shards.
A shard is a subset of the data. Basically, your entire data is divided between a group of shards.
In MongoDB, each shard is deployed as a replica set, meaning each shard is also a cluster of individual MongoDB instances.
This is a wonderful thing. Why?
Because you get replication and automated failover and all the good stuff.
See the below illustration that shows what I mean.
The upstream system (the red box) over here is the Mongos router. We will get to that in a bit.
But the big picture story is that when you operate a MongoDB cluster in sharded mode, you should not make direct requests to a particular shard. You go through a middleman and that’s the router.
Each shard only has some part of the data. So any queries sent directly to the shard will return only a subset of the data. This is usually not what we want in our application.
You should directly connect to a shard only when you need to perform some local administrative and maintenance operations.
👉 Mongos Router
The Mongos router plays a key role within a MongoDB cluster.
Instead of directly querying the shard from your application server, you make a query to the mongos
instance.
The mongos
instance routes the incoming queries to the appropriate shard within the sharded MongoDB cluster.
This means that from the perspective of an application, mongos
is the only interface to a sharded cluster. As I said earlier, think of it as a middleman that coordinates access to an important person.
The below illustration shows the role of the mongos
router.
There are two important activities performed by the mongos
instance.
Query routing and load balancing – Being the single entry point for the client application, the
mongos
router does the job of routing the queries to the appropriate shards and distributes the workload to ensure high performance.Metadata caching – The
mongos
also tracks what data is present on which shard so that it can route queries to the correct shard. However, it does not store any persistent state and instead relies on caching the metadata from the config servers. This makes it super-efficient in terms of resource utilization.
👉 Config Servers
The last component in the MongoDB sharding architecture puzzle is the Config Server.
The job of a Config Server is to store the metadata for your MongoDB sharded cluster.
Think of this metadata as the index for your cluster. It answers key questions such as:
How the data is organized?
What all components are present in the cluster?
The metadata includes the list of chunks on each and every shard and the ranges that define the chunks.
As we discussed earlier, the mongos
instance relies on the config server for all its information. It caches the information from the config server and uses it to route the read and write requests to appropriate shards.
If the config server is the central authority in a cluster, the mongos
instance is like the enforcer.
⚽ Over to you
Do you use MongoDB in your application?
If yes, do you employ techniques such as replication and sharding?
Write your replies or thoughts in the comments section.
And that’s all for this post.
The MongoDB sharding architecture makes it easy to visualize what you need to have in order to implement sharding.
If you found today’s post useful, give it a like and consider sharing it with friends and colleagues.
Wishing you a great week ahead! ☀️
See you later.