Sharding

There are two methods for addressing system growth: vertical and horizontal scaling.

  • Vertical Scaling involves increasing the capacity of a single server, such as using a more powerful CPU, adding more RAM, or increasing the amount of storage space.
  • Horizontal Scaling involves dividing the system dataset and load over multiple servers, adding additional servers to increase capacity as required.

MongoDB supports horizontal scaling through sharding.

Sharded Cluster

A MongoDB sharded cluster consists of the following components:

  • shard: Each shard contains a subset of the sharded data. Each shard can be deployed as a replica set.
  • mongos: The mongos acts as a query router, providing an interface between client applications and the sharded cluster.
  • config servers: Config servers store metadata and configuration settings for the cluster. As of MongoDB 3.4, config servers must be deployed as a replica set (CSRS).

Interaction of Components Within a Sharded Cluster

MongoDB shards data at the collection level, distributing the collection data across the shards in the cluster.

Shard Keys

To distribute the documents in a collection, MongoDB partitions the collection using the shard key. The shard key consists of an immutable field or fields that exist in every document in the target collection.

You choose the shard key when sharding a collection. The choice of shard key cannot be changed after sharding. A sharded collection can have only one shard key.

The choice of shard key affects the performance, efficiency, and scalability of a sharded cluster.

Chunks

MongoDB partitions sharded data into chunks. Each chunk has an inclusive lower and exclusive upper range based on the shard key.

MongoDB migrates chunks across the shards in the sharded cluster using the sharded cluster balancer. The balancer attempts to achieve an even balance of chunks across all shards in the cluster.

Advantages of Sharding

Reads / Writes

Both read and write workloads can be scaled horizontally across the cluster by adding more shards.

Storage Capacity

As the data set grows, additional shards increase the storage capacity of the cluster.

High Availability

While the subset of data on the unavailable shards cannot be accessed during the downtime, reads or writes directed at the available shards can still succeed.

In production environments, individual shards should be deployed as replica sets, providing increased redundancy and availability.

Considerations Before Sharding

Careful consideration in choosing the shard key is necessary for ensuring cluster performance and efficiency. You cannot change the shard key after sharding, nor can you unshard a sharded collection.

Sharding has certain operational requirements and restrictions. See Operational Restrictions in Sharded Clusters for more information.

If queries do not include the shard key or the prefix of a compound shard key, mongos performs a broadcast operation, querying all shards in the sharded cluster. These scatter/gather queries can be long running operations.

Sharded and Non-Sharded Collections

Sharded and Non-Sharded Collections

Connecting to a Sharded Cluster

You must connect to a mongos router to interact with any collection in the sharded cluster. This includes sharded and unsharded collections. Clients should never connect to a single shard in order to perform read or write operations.

Connecting to a Sharded Cluster

Sharding Strategy

MongoDB supports two sharding strategies for distributing data across sharded clusters.

Hashed Sharding

Hashed Sharding involves computing a hash of the shard key field’s value. Each chunk is then assigned a range based on the hashed shard key values.

Hashed Sharding

Hashed distribution means that ranged-based queries on the shard key are less likely to target a single shard, resulting in more cluster wide broadcast operations.

Ranged Sharding

Ranged sharding involves dividing data into ranges based on the shard key values. Each chunk is then assigned a range based on the shard key values.

Ranged Sharding

The efficiency of ranged sharding depends on the shard key chosen. Poorly considered shard keys can result in uneven distribution of data, which can negate some benefits of sharding or can cause performance bottlenecks.

Zones in Sharded Clusters

In sharded clusters, you can create zones of sharded data based on the shard key. You can associate each zone with one or more shards in the cluster. A shard can associate with any number of zones. In a balanced cluster, MongoDB migrates chunks covered by a zone only to those shards associated with the zone.

Zones in Sharded Clusters

Most commonly, zones serve to improve the locality of data for sharded clusters that span multiple data centers.

Collations in Sharding

Use the shardCollection command with the collation : { locale : "simple" } option to shard a collection which has a default collation.

results matching ""

    No results matching ""