System Design

System Design Fundamentals

The core principles behind building scalable, reliable, and maintainable distributed systems -- from requirements gathering through resilience patterns.

01 / Requirements & Estimation

Start With the Right Questions

Every system design begins by splitting requirements into two buckets: what the system does (functional) and how well it does it (non-functional). Missing a non-functional requirement early can force a full re-architecture later.

Functional vs Non-Functional

Functional requirements describe user-visible behavior: "users can upload photos," "the feed shows posts in reverse-chronological order." Non-functional requirements (NFRs) describe quality attributes: latency, throughput, availability, durability, consistency, and security.

Key Insight
NFRs drive architectural decisions more than features do. A chat app needing 50ms delivery latency leads to a fundamentally different design than one tolerating 5 seconds.

Back-of-Envelope Estimation

Before diving into components, estimate the scale. Three numbers matter most:

MetricFormulaExample (100M DAU)
QPS (queries/sec)DAU x actions/day / 86400100M x 5 / 86400 ~ 5,800 QPS
Storageobjects/day x size x retention10M photos x 500KB x 365 ~ 1.8 PB/yr
BandwidthQPS x avg response size5800 x 200KB ~ 1.1 GB/s

SLA Targets

Service Level Agreements quantify NFRs as promises. The "nines" of availability are the most common measure:

AvailabilityDowntime / YearTypical Use
99.9% (three nines)~8.76 hoursInternal tools
99.99% (four nines)~52.6 minutesE-commerce, SaaS
99.999% (five nines)~5.26 minutesPayment systems, DNS
02 / Scalability

Growing Without Breaking

Scalability is the system's ability to handle growing load by adding resources. The two fundamental axes are vertical (bigger machines) and horizontal (more machines).

AspectVertical ScalingHorizontal Scaling
ApproachBigger CPU, RAM, SSDAdd more instances
LimitHardware ceilingTheoretically unlimited
DowntimeOften requires restartZero-downtime possible
ComplexitySimple, no code changesRequires stateless design
Cost curveExponentialLinear

Stateless vs Stateful Services

Horizontal scaling demands stateless services: any instance can handle any request because session data lives externally (Redis, DB, JWT). Stateful services (e.g., WebSocket servers holding connections) need sticky sessions or connection-aware routing, making scaling harder.

Rule of Thumb
Push state out of compute and into dedicated stores. Stateless services are cattle; stateful services are pets.

Database Scaling

The database is usually the first bottleneck. Three strategies, often combined:

Read Replicas

Writes go to primary; reads fan out to replicas. Great for read-heavy workloads (10:1 read/write). Tradeoff: replication lag means eventual consistency.

Sharding

Partition data across nodes by a shard key (user_id, geo). Eliminates single-node bottleneck. Tradeoff: cross-shard queries are expensive; resharding is painful.

CQRS

Command Query Responsibility Segregation: separate write model (normalized) from read model (denormalized views). Allows each side to scale independently.

03 / High Availability

Staying Up When Things Go Down

High availability means the system continues serving requests even when individual components fail. The core principle: no single point of failure.

Redundancy at Every Layer

Redundancy stack
DNS (multi-provider)
Load Balancer (pair)
App Servers (N+1)
DB (primary + replicas)

Active-Active vs Active-Passive

PatternHow It WorksProsCons
Active-ActiveAll nodes serve traffic simultaneouslyFull resource utilization, faster failoverData conflict resolution needed
Active-PassiveStandby takes over on primary failureSimpler consistency modelWasted standby resources, slower failover

Graceful Degradation

When a dependency fails, degrade gracefully instead of crashing entirely. Examples: serve cached data when the DB is down, disable recommendations but keep search working, show a static page when the rendering service is overloaded.

Chaos Engineering
Don't just hope your failover works -- prove it. Netflix's Chaos Monkey randomly kills production instances. Run game days: intentionally inject failures (network partition, disk full, high latency) and verify the system recovers within SLA.
04 / API Design

Contracts Between Services

A well-designed API is the backbone of any distributed system. Bad API decisions are extremely expensive to reverse because clients depend on them.

REST Principles

REST organizes around resources (nouns, not verbs). Use HTTP methods semantically: GET for reads, POST for creation, PUT/PATCH for updates, DELETE for removal. Responses should use proper status codes (201 Created, 404 Not Found, 429 Too Many Requests).

Pagination

StyleMechanismBest ForWeakness
Offset?page=3&limit=20Simple UIs, known total countSlow on large offsets (DB scans rows); inconsistent if data changes between pages
Cursor?after=abc123&limit=20Infinite scroll, real-time feedsNo random page access; cursor must be opaque to clients

Versioning, Idempotency & Rate Limiting

Versioning: Use URL path (/v2/users) or headers (Accept: application/vnd.api+json;version=2). Path versioning is simpler; header versioning is more RESTful.

Idempotency keys: Clients send a unique key (e.g., UUID) with mutating requests. The server deduplicates: if it sees the same key again, it returns the original response. Critical for payment APIs where a retry must not double-charge.

Rate limiting: Protect services from abuse and cascading overload. Common algorithms include token bucket and sliding window (covered in Section 06).

Idempotency Pattern
Client sends Idempotency-Key: uuid-abc with a POST. Server checks a store (Redis) for that key. If found, return cached response. If not, process request, store result keyed by the idempotency key with a TTL (e.g., 24h), return response.
05 / Microservices vs Monolith

Where to Draw the Lines

The monolith-vs-microservices debate isn't about which is "better" -- it's about which tradeoffs match your team size, deployment cadence, and domain complexity.

DimensionMonolithMicroservices
DeploymentSingle deployable unitIndependent per-service deploys
DataShared databaseDatabase-per-service
ConsistencyACID transactions easyDistributed transactions (Saga pattern)
Team scalingMerge conflicts grow with teamTeams own services independently
Operational costLow (one thing to run)High (service mesh, observability, CI/CD per service)
Best starting pointNew products, small teamsProven domain, multiple teams

Service Boundaries (Domain-Driven Design)

Use DDD's Bounded Contexts to find natural service boundaries. Each bounded context owns its domain model, data, and vocabulary. "User" in the billing context (payment methods, invoices) is different from "User" in the social context (profile, followers). Don't split by technical layer (API service, DB service) -- split by business capability.

Communication Patterns

Synchronous (HTTP/gRPC)

Request-response. Simple, but creates tight coupling and cascading failures if a downstream service is slow.

Asynchronous (Message Queue)

Producer publishes events; consumer processes them later. Decouples services in time. Adds eventual consistency and message ordering complexity.

Event-Driven (Pub/Sub)

Services react to domain events (OrderPlaced, PaymentFailed). Enables loose coupling but requires careful schema evolution and idempotent consumers.

Data Consistency Warning
With database-per-service, you lose cross-service JOINs and ACID transactions. Use the Saga pattern (choreography or orchestration) to coordinate multi-service operations, and accept that reads may be eventually consistent.
06 / Common Components & Reliability Patterns

The Building Blocks

Most distributed systems are assembled from a common set of infrastructure components. Knowing what each one does -- and when to reach for it -- is half the design work.

Infrastructure Components

Load Balancer

Distributes traffic across healthy instances. L4 (TCP) or L7 (HTTP) with algorithms like round-robin, least-connections, or consistent hashing.

API Gateway

Single entry point for clients. Handles auth, rate limiting, request routing, protocol translation, and response aggregation.

CDN

Caches static assets at edge locations close to users. Reduces latency and offloads origin servers. Invalidation is the hard part.

Message Queue

Buffers work between producers and consumers (Kafka, RabbitMQ, SQS). Enables async processing, load leveling, and retry semantics.

Distributed Cache

In-memory key-value store (Redis, Memcached). Sub-millisecond reads for hot data. Must handle cache invalidation, stampede, and penetration.

Object Storage

Blob storage for images, videos, backups (S3). Virtually unlimited capacity, 11 nines of durability, cheap at rest.

Search Engine

Full-text and analytics engine (Elasticsearch, OpenSearch). Inverted index for fast text search; requires index sync from primary datastore.

Task Queue

Schedules and executes background jobs (Celery, Sidekiq, Bull). Handles retries, dead-letter queues, and priority scheduling.

Reliability Patterns

Distributed systems fail in partial, unpredictable ways. These patterns contain failures and prevent them from cascading.

Request flow with resilience patterns
Client
Rate Limiter
Circuit Breaker
Bulkhead
Service
PatternWhat It DoesAnalogy
Circuit BreakerStops calling a failing service after N errors. Moves through Closed → Open → Half-Open states. Prevents wasting resources on a dead dependency.Electrical circuit breaker trips to prevent fire
BulkheadIsolates resources (thread pools, connection pools) per dependency so one slow service can't exhaust all capacity.Ship compartments prevent total flooding
Retry + Backoff + JitterRetry failed requests with exponential backoff (1s, 2s, 4s...) plus random jitter to avoid thundering herd.Politely knocking again, waiting longer each time

Rate Limiting Algorithms

AlgorithmHow It WorksPros / Cons
Token BucketTokens added at fixed rate; each request consumes a token. Requests rejected when bucket is empty.Allows bursts up to bucket size. Simple. Widely used (e.g., API Gateway).
Sliding Window LogStores timestamp of each request. Counts requests in the last N seconds.Precise, but memory-heavy for high-volume endpoints.
Sliding Window CounterHybrid: uses fixed window counts + weighted overlap for the current sliding window.Memory-efficient approximation. Good balance of accuracy and performance.
Retry Formula
delay = min(base * 2^attempt + random(0, base), max_delay) — Exponential backoff with jitter. The jitter is critical: without it, all clients retry at the same instant, recreating the overload.

Test Yourself

Score: 0 / 10
Question 01
A system has 100M daily active users, each making 10 requests per day. What is the approximate QPS?
100M x 10 / 86,400 seconds = ~11,574 QPS. Always divide total daily requests by 86,400 (seconds in a day) for average QPS. Peak QPS is typically 2-5x the average.
Question 02
What is the primary prerequisite for horizontal scaling of application servers?
Stateless services allow any instance to handle any request because no session data is stored locally. State is pushed to external stores (Redis, DB, JWT), enabling the load balancer to route requests to any available instance.
Question 03
In an active-active HA setup, what is the main challenge compared to active-passive?
Active-active has all nodes serving traffic and potentially writing data simultaneously, which creates the possibility of conflicting writes. Conflict resolution strategies (last-write-wins, CRDTs, application-level merging) add complexity.
Question 04
Why is cursor-based pagination preferred over offset-based for real-time feeds?
Offset pagination breaks when items are inserted or deleted between page fetches -- you may see duplicates or skip items. Cursor pagination uses a stable reference point (e.g., the last seen ID), so new inserts don't shift the window.
Question 05
What is the purpose of an idempotency key in API design?
Idempotency keys let the server detect duplicate requests (e.g., a client retrying after a timeout). The server stores the result keyed by the idempotency key and returns the cached response for duplicates, preventing double charges, duplicate records, etc.
Question 06
According to DDD, how should microservice boundaries be drawn?
DDD's bounded contexts group related domain concepts that share a consistent model and vocabulary. Each bounded context maps naturally to a microservice that owns its data and business rules, aligned with a business capability rather than a technical layer.
Question 07
A circuit breaker in the "Open" state will:
When a circuit breaker is Open, it short-circuits all calls and fails fast, preventing the caller from wasting resources on a dependency that is known to be failing. After a timeout, it transitions to Half-Open and lets a few probe requests through to test recovery.
Question 08
Why is jitter added to exponential backoff in retry logic?
Without jitter, all clients that failed at the same time will retry at the same time (e.g., exactly 1s, 2s, 4s later), recreating the same overload spike. Random jitter spreads retries across time, giving the recovering service a chance to process them gradually.
Question 09
What does the bulkhead pattern isolate?
The bulkhead pattern assigns separate resource pools (thread pools, connection pools, semaphores) to each downstream dependency. If one service becomes slow and exhausts its pool, other services remain unaffected because they have their own isolated pools.
Question 10
In the token bucket rate limiting algorithm, what happens when the bucket is empty?
When no tokens are available, the request is rejected with a 429 Too Many Requests response. Tokens are replenished at a fixed rate. The bucket size determines the maximum burst allowed -- a full bucket lets that many requests through immediately before rate limiting kicks in.