Sharding: Architecture Pattern

Jul 19, 2023

Sharding is a technique used to horizontally partition a data-store into smaller, more manageable fragments called shards, which are distributed across multiple servers or nodes.

Read →

2 Comments

Comment deleted

Aug 1, 2023

Comment deleted

Expand full comment

Pratik Pandey

Aug 1, 2023

It's two different concepts @sajid007.

Sharding stores your data across multiple nodes in a cluster and the distribution of which data goes where is determined by the shard key.

Spark during processing time, uses map reduce which splits the data across multiple nodes allowing parallel processing of data and then reducing the results of the parallel processes to give you the final result. This is a compute time partitioning only(temporary) vs sharding is permanent, that's how the data is stored.

Spark generally uses data from HDFS or S3, which internally does act like a sharded data store.

Hope that explains it. Please let me know if you have any further questions.

Expand full comment

Tanay Srivastava

Aug 23, 2023

Sharding is for optimized data storage and retrieval whereas Spark Partitioning is for parallel processing.

Expand full comment