Scalability is the ability of a system to handle increased load by adding resources (servers, CPUs, RAM, etc.). It's one of the most critical aspects of system design when dealing with growing users and data.
There are two main approaches to scalability:
Vertical scaling refers to increasing the capacity of a single server (e.g., adding more CPU, RAM, or disk space). Think of it like upgrading a computer to make it faster and handle more load.
Single Server (Low capacity)
│
▼
+---------------+
| Server 1 | ← Upgrade CPU/RAM/Storage
+---------------+
│
▼
Improved Single Server (High capacity)
Suppose you have a small e-commerce website running on one server. To handle more traffic, you add:
The server now performs better but still has its limitations.
| Pros | Cons |
|---|---|
| Simple to implement and manage. | Hardware has physical limits (CPU, RAM, etc.). |
| No changes in software architecture. | Single point of failure if the server crashes. |
| Good for low traffic systems. | Expensive for very high-end hardware. |
Horizontal scaling involves adding more servers to the system. Instead of upgrading a single server, you distribute the load across multiple servers.
This approach is much better for handling large-scale traffic.
+---------------+ +---------------+
Clients →| Load Balancer |------->| Server 1 |
+---------------+ +---------------+
│ │
▼ ▼
+---------------+ +---------------+
| Server 2 | | Server 3 |
+---------------+ +---------------+
Imagine an online streaming platform like YouTube. To serve millions of users:
| Pros | Cons |
|---|---|
| No single point of failure (redundancy). | More complex to implement and manage. |
| Can handle unlimited traffic. | Requires load balancing and data replication. |
| Flexible and cost-effective. | Ensuring consistency across servers is harder. |
| Feature | Vertical Scaling | Horizontal Scaling |
|---|---|---|
| Scaling approach | Add resources to a single server | Add more servers. |
| Limitations | Hardware limits | Nearly unlimited. |
| Cost | Expensive hardware upgrades | Cost-effective with commodity servers. |
| Complexity | Low | Higher (load balancing, etc.). |
| Single point failure | Yes | No (redundant servers). |
Caching involves storing frequently accessed data in a temporary, fast-access location (like memory). This reduces the load on databases and speeds up response times.
Caches can be:
+------------------+
| Application |
+------------------+
│
+----------------▼---------------+
| Cache |
| (e.g., Redis, Memcached) |
+----------------▲---------------+
│
+------------------+
| Database |
+------------------+
A user frequently views their profile on a social media app. To avoid repeatedly querying the database:
When cache is full, old data must be evicted:
Load balancing distributes incoming requests across multiple servers to:
Clients
│
▼
+------------------+
| Load Balancer |
+------------------+
│ │ │
▼ ▼ ▼
+-------+ +-------+ +-------+
|Server 1| |Server 2| |Server 3|
+-------+ +-------+ +-------+
Imagine Google Search. Millions of requests are handled simultaneously by:
Database replication involves copying data from a master database to one or more replica databases. This improves:
+------------------+
| Master DB |
+--------▲---------+
│
+----------▼-----------+
| Replica DBs |
+-----------+----------+
│ │
▼ ▼
+-------------+ +-------------+
| Read Query | | Read Query |
+-------------+ +-------------+
Master-Slave Replication:
Master-Master Replication:
In a global app like Instagram:
Partitioning (sharding) splits large databases into smaller, manageable parts (shards). Each shard contains a subset of the data.
+--------------------+
| Load Balancer |
+---------▲----------+
│
+---------+-------+-------+---------+
| Shard 1 | Shard 2 | Shard 3 | Shard 4 |
+---------+---------+---------+---------+
In a large user database:
| Concept | Purpose |
|---|---|
| Vertical Scaling | Add resources to a single server. |
| Horizontal Scaling | Add more servers to handle the load. |
| Caching | Speed up responses using temporary storage. |
| Load Balancing | Distribute traffic across servers. |
| Database Replication | Copy data to improve read performance. |
| Database Partitioning | Split data into smaller, manageable parts. |