Start With the Right Questions
Every system design begins by splitting requirements into two buckets: what the system does (functional) and how well it does it (non-functional). Missing a non-functional requirement early can force a full re-architecture later.
Functional vs Non-Functional
Functional requirements describe user-visible behavior: "users can upload photos," "the feed shows posts in reverse-chronological order." Non-functional requirements (NFRs) describe quality attributes: latency, throughput, availability, durability, consistency, and security.
Back-of-Envelope Estimation
Before diving into components, estimate the scale. Three numbers matter most:
| Metric | Formula | Example (100M DAU) |
|---|---|---|
| QPS (queries/sec) | DAU x actions/day / 86400 | 100M x 5 / 86400 ~ 5,800 QPS |
| Storage | objects/day x size x retention | 10M photos x 500KB x 365 ~ 1.8 PB/yr |
| Bandwidth | QPS x avg response size | 5800 x 200KB ~ 1.1 GB/s |
SLA Targets
Service Level Agreements quantify NFRs as promises. The "nines" of availability are the most common measure:
| Availability | Downtime / Year | Typical Use |
|---|---|---|
| 99.9% (three nines) | ~8.76 hours | Internal tools |
| 99.99% (four nines) | ~52.6 minutes | E-commerce, SaaS |
| 99.999% (five nines) | ~5.26 minutes | Payment systems, DNS |
Growing Without Breaking
Scalability is the system's ability to handle growing load by adding resources. The two fundamental axes are vertical (bigger machines) and horizontal (more machines).
| Aspect | Vertical Scaling | Horizontal Scaling |
|---|---|---|
| Approach | Bigger CPU, RAM, SSD | Add more instances |
| Limit | Hardware ceiling | Theoretically unlimited |
| Downtime | Often requires restart | Zero-downtime possible |
| Complexity | Simple, no code changes | Requires stateless design |
| Cost curve | Exponential | Linear |
Stateless vs Stateful Services
Horizontal scaling demands stateless services: any instance can handle any request because session data lives externally (Redis, DB, JWT). Stateful services (e.g., WebSocket servers holding connections) need sticky sessions or connection-aware routing, making scaling harder.
Database Scaling
The database is usually the first bottleneck. Three strategies, often combined:
Writes go to primary; reads fan out to replicas. Great for read-heavy workloads (10:1 read/write). Tradeoff: replication lag means eventual consistency.
Partition data across nodes by a shard key (user_id, geo). Eliminates single-node bottleneck. Tradeoff: cross-shard queries are expensive; resharding is painful.
Command Query Responsibility Segregation: separate write model (normalized) from read model (denormalized views). Allows each side to scale independently.
Staying Up When Things Go Down
High availability means the system continues serving requests even when individual components fail. The core principle: no single point of failure.
Redundancy at Every Layer
Active-Active vs Active-Passive
| Pattern | How It Works | Pros | Cons |
|---|---|---|---|
| Active-Active | All nodes serve traffic simultaneously | Full resource utilization, faster failover | Data conflict resolution needed |
| Active-Passive | Standby takes over on primary failure | Simpler consistency model | Wasted standby resources, slower failover |
Graceful Degradation
When a dependency fails, degrade gracefully instead of crashing entirely. Examples: serve cached data when the DB is down, disable recommendations but keep search working, show a static page when the rendering service is overloaded.
Contracts Between Services
A well-designed API is the backbone of any distributed system. Bad API decisions are extremely expensive to reverse because clients depend on them.
REST Principles
REST organizes around resources (nouns, not verbs). Use HTTP methods semantically: GET for reads, POST for creation, PUT/PATCH for updates, DELETE for removal. Responses should use proper status codes (201 Created, 404 Not Found, 429 Too Many Requests).
Pagination
| Style | Mechanism | Best For | Weakness |
|---|---|---|---|
| Offset | ?page=3&limit=20 | Simple UIs, known total count | Slow on large offsets (DB scans rows); inconsistent if data changes between pages |
| Cursor | ?after=abc123&limit=20 | Infinite scroll, real-time feeds | No random page access; cursor must be opaque to clients |
Versioning, Idempotency & Rate Limiting
Versioning: Use URL path (/v2/users) or headers (Accept: application/vnd.api+json;version=2). Path versioning is simpler; header versioning is more RESTful.
Idempotency keys: Clients send a unique key (e.g., UUID) with mutating requests. The server deduplicates: if it sees the same key again, it returns the original response. Critical for payment APIs where a retry must not double-charge.
Rate limiting: Protect services from abuse and cascading overload. Common algorithms include token bucket and sliding window (covered in Section 06).
Idempotency-Key: uuid-abc with a POST. Server checks a store (Redis) for that key. If found, return cached response. If not, process request, store result keyed by the idempotency key with a TTL (e.g., 24h), return response.
Where to Draw the Lines
The monolith-vs-microservices debate isn't about which is "better" -- it's about which tradeoffs match your team size, deployment cadence, and domain complexity.
| Dimension | Monolith | Microservices |
|---|---|---|
| Deployment | Single deployable unit | Independent per-service deploys |
| Data | Shared database | Database-per-service |
| Consistency | ACID transactions easy | Distributed transactions (Saga pattern) |
| Team scaling | Merge conflicts grow with team | Teams own services independently |
| Operational cost | Low (one thing to run) | High (service mesh, observability, CI/CD per service) |
| Best starting point | New products, small teams | Proven domain, multiple teams |
Service Boundaries (Domain-Driven Design)
Use DDD's Bounded Contexts to find natural service boundaries. Each bounded context owns its domain model, data, and vocabulary. "User" in the billing context (payment methods, invoices) is different from "User" in the social context (profile, followers). Don't split by technical layer (API service, DB service) -- split by business capability.
Communication Patterns
Request-response. Simple, but creates tight coupling and cascading failures if a downstream service is slow.
Producer publishes events; consumer processes them later. Decouples services in time. Adds eventual consistency and message ordering complexity.
Services react to domain events (OrderPlaced, PaymentFailed). Enables loose coupling but requires careful schema evolution and idempotent consumers.
The Building Blocks
Most distributed systems are assembled from a common set of infrastructure components. Knowing what each one does -- and when to reach for it -- is half the design work.
Infrastructure Components
Distributes traffic across healthy instances. L4 (TCP) or L7 (HTTP) with algorithms like round-robin, least-connections, or consistent hashing.
Single entry point for clients. Handles auth, rate limiting, request routing, protocol translation, and response aggregation.
Caches static assets at edge locations close to users. Reduces latency and offloads origin servers. Invalidation is the hard part.
Buffers work between producers and consumers (Kafka, RabbitMQ, SQS). Enables async processing, load leveling, and retry semantics.
In-memory key-value store (Redis, Memcached). Sub-millisecond reads for hot data. Must handle cache invalidation, stampede, and penetration.
Blob storage for images, videos, backups (S3). Virtually unlimited capacity, 11 nines of durability, cheap at rest.
Full-text and analytics engine (Elasticsearch, OpenSearch). Inverted index for fast text search; requires index sync from primary datastore.
Schedules and executes background jobs (Celery, Sidekiq, Bull). Handles retries, dead-letter queues, and priority scheduling.
Reliability Patterns
Distributed systems fail in partial, unpredictable ways. These patterns contain failures and prevent them from cascading.
| Pattern | What It Does | Analogy |
|---|---|---|
| Circuit Breaker | Stops calling a failing service after N errors. Moves through Closed → Open → Half-Open states. Prevents wasting resources on a dead dependency. | Electrical circuit breaker trips to prevent fire |
| Bulkhead | Isolates resources (thread pools, connection pools) per dependency so one slow service can't exhaust all capacity. | Ship compartments prevent total flooding |
| Retry + Backoff + Jitter | Retry failed requests with exponential backoff (1s, 2s, 4s...) plus random jitter to avoid thundering herd. | Politely knocking again, waiting longer each time |
Rate Limiting Algorithms
| Algorithm | How It Works | Pros / Cons |
|---|---|---|
| Token Bucket | Tokens added at fixed rate; each request consumes a token. Requests rejected when bucket is empty. | Allows bursts up to bucket size. Simple. Widely used (e.g., API Gateway). |
| Sliding Window Log | Stores timestamp of each request. Counts requests in the last N seconds. | Precise, but memory-heavy for high-volume endpoints. |
| Sliding Window Counter | Hybrid: uses fixed window counts + weighted overlap for the current sliding window. | Memory-efficient approximation. Good balance of accuracy and performance. |
delay = min(base * 2^attempt + random(0, base), max_delay) — Exponential backoff with jitter. The jitter is critical: without it, all clients retry at the same instant, recreating the overload.