System Design Fundamentals — learn.surkar.in

01 / Requirements & Estimation

Start With the Right Questions

Every system design begins by splitting requirements into two buckets: what the system does (functional) and how well it does it (non-functional). Missing a non-functional requirement early can force a full re-architecture later.

Functional vs Non-Functional

Functional requirements describe user-visible behavior: "users can upload photos," "the feed shows posts in reverse-chronological order." Non-functional requirements (NFRs) describe quality attributes: latency, throughput, availability, durability, consistency, and security.

Key Insight

NFRs drive architectural decisions more than features do. A chat app needing 50ms delivery latency leads to a fundamentally different design than one tolerating 5 seconds.

Back-of-Envelope Estimation

Before diving into components, estimate the scale. Three numbers matter most:

Metric	Formula	Example (100M DAU)
QPS (queries/sec)	DAU x actions/day / 86400	100M x 5 / 86400 ~ 5,800 QPS
Storage	objects/day x size x retention	10M photos x 500KB x 365 ~ 1.8 PB/yr
Bandwidth	QPS x avg response size	5800 x 200KB ~ 1.1 GB/s

SLA Targets

Service Level Agreements quantify NFRs as promises. The "nines" of availability are the most common measure:

Availability	Downtime / Year	Typical Use
99.9% (three nines)	~8.76 hours	Internal tools
99.99% (four nines)	~52.6 minutes	E-commerce, SaaS
99.999% (five nines)	~5.26 minutes	Payment systems, DNS

02 / Scalability

Growing Without Breaking

Scalability is the system's ability to handle growing load by adding resources. The two fundamental axes are vertical (bigger machines) and horizontal (more machines).

Aspect	Vertical Scaling	Horizontal Scaling
Approach	Bigger CPU, RAM, SSD	Add more instances
Limit	Hardware ceiling	Theoretically unlimited
Downtime	Often requires restart	Zero-downtime possible
Complexity	Simple, no code changes	Requires stateless design
Cost curve	Exponential	Linear

Stateless vs Stateful Services

Horizontal scaling demands stateless services: any instance can handle any request because session data lives externally (Redis, DB, JWT). Stateful services (e.g., WebSocket servers holding connections) need sticky sessions or connection-aware routing, making scaling harder.

Rule of Thumb

Push state out of compute and into dedicated stores. Stateless services are cattle; stateful services are pets.

Database Scaling

The database is usually the first bottleneck. Three strategies, often combined:

Read Replicas

Writes go to primary; reads fan out to replicas. Great for read-heavy workloads (10:1 read/write). Tradeoff: replication lag means eventual consistency.

Sharding

Partition data across nodes by a shard key (user_id, geo). Eliminates single-node bottleneck. Tradeoff: cross-shard queries are expensive; resharding is painful.

CQRS

Command Query Responsibility Segregation: separate write model (normalized) from read model (denormalized views). Allows each side to scale independently.

03 / High Availability

Staying Up When Things Go Down

High availability means the system continues serving requests even when individual components fail. The core principle: no single point of failure.

Redundancy at Every Layer

Redundancy stack

DNS (multi-provider)

→

Load Balancer (pair)

→

App Servers (N+1)

→

DB (primary + replicas)

Active-Active vs Active-Passive

Pattern	How It Works	Pros	Cons
Active-Active	All nodes serve traffic simultaneously	Full resource utilization, faster failover	Data conflict resolution needed
Active-Passive	Standby takes over on primary failure	Simpler consistency model	Wasted standby resources, slower failover

Graceful Degradation

When a dependency fails, degrade gracefully instead of crashing entirely. Examples: serve cached data when the DB is down, disable recommendations but keep search working, show a static page when the rendering service is overloaded.

Chaos Engineering

Don't just hope your failover works -- prove it. Netflix's Chaos Monkey randomly kills production instances. Run game days: intentionally inject failures (network partition, disk full, high latency) and verify the system recovers within SLA.

04 / API Design

Contracts Between Services

A well-designed API is the backbone of any distributed system. Bad API decisions are extremely expensive to reverse because clients depend on them.

REST Principles

REST organizes around resources (nouns, not verbs). Use HTTP methods semantically: GET for reads, POST for creation, PUT/PATCH for updates, DELETE for removal. Responses should use proper status codes (201 Created, 404 Not Found, 429 Too Many Requests).

Pagination

Style	Mechanism	Best For	Weakness
Offset	`?page=3&limit=20`	Simple UIs, known total count	Slow on large offsets (DB scans rows); inconsistent if data changes between pages
Cursor	`?after=abc123&limit=20`	Infinite scroll, real-time feeds	No random page access; cursor must be opaque to clients

Versioning, Idempotency & Rate Limiting

Versioning: Use URL path (/v2/users) or headers (Accept: application/vnd.api+json;version=2). Path versioning is simpler; header versioning is more RESTful.

Idempotency keys: Clients send a unique key (e.g., UUID) with mutating requests. The server deduplicates: if it sees the same key again, it returns the original response. Critical for payment APIs where a retry must not double-charge.

Rate limiting: Protect services from abuse and cascading overload. Common algorithms include token bucket and sliding window (covered in Section 06).

Idempotency Pattern

Client sends Idempotency-Key: uuid-abc with a POST. Server checks a store (Redis) for that key. If found, return cached response. If not, process request, store result keyed by the idempotency key with a TTL (e.g., 24h), return response.

05 / Microservices vs Monolith

Where to Draw the Lines

The monolith-vs-microservices debate isn't about which is "better" -- it's about which tradeoffs match your team size, deployment cadence, and domain complexity.

Dimension	Monolith	Microservices
Deployment	Single deployable unit	Independent per-service deploys
Data	Shared database	Database-per-service
Consistency	ACID transactions easy	Distributed transactions (Saga pattern)
Team scaling	Merge conflicts grow with team	Teams own services independently
Operational cost	Low (one thing to run)	High (service mesh, observability, CI/CD per service)
Best starting point	New products, small teams	Proven domain, multiple teams

Service Boundaries (Domain-Driven Design)

Use DDD's Bounded Contexts to find natural service boundaries. Each bounded context owns its domain model, data, and vocabulary. "User" in the billing context (payment methods, invoices) is different from "User" in the social context (profile, followers). Don't split by technical layer (API service, DB service) -- split by business capability.

Communication Patterns

Synchronous (HTTP/gRPC)

Request-response. Simple, but creates tight coupling and cascading failures if a downstream service is slow.

Asynchronous (Message Queue)

Producer publishes events; consumer processes them later. Decouples services in time. Adds eventual consistency and message ordering complexity.

Event-Driven (Pub/Sub)

Services react to domain events (OrderPlaced, PaymentFailed). Enables loose coupling but requires careful schema evolution and idempotent consumers.

Data Consistency Warning

With database-per-service, you lose cross-service JOINs and ACID transactions. Use the Saga pattern (choreography or orchestration) to coordinate multi-service operations, and accept that reads may be eventually consistent.

06 / Common Components & Reliability Patterns

The Building Blocks

Most distributed systems are assembled from a common set of infrastructure components. Knowing what each one does -- and when to reach for it -- is half the design work.

Infrastructure Components

Load Balancer

Distributes traffic across healthy instances. L4 (TCP) or L7 (HTTP) with algorithms like round-robin, least-connections, or consistent hashing.

API Gateway

Single entry point for clients. Handles auth, rate limiting, request routing, protocol translation, and response aggregation.

CDN

Caches static assets at edge locations close to users. Reduces latency and offloads origin servers. Invalidation is the hard part.

Message Queue

Buffers work between producers and consumers (Kafka, RabbitMQ, SQS). Enables async processing, load leveling, and retry semantics.

Distributed Cache

In-memory key-value store (Redis, Memcached). Sub-millisecond reads for hot data. Must handle cache invalidation, stampede, and penetration.

Object Storage

Blob storage for images, videos, backups (S3). Virtually unlimited capacity, 11 nines of durability, cheap at rest.

Search Engine

Full-text and analytics engine (Elasticsearch, OpenSearch). Inverted index for fast text search; requires index sync from primary datastore.

Task Queue

Schedules and executes background jobs (Celery, Sidekiq, Bull). Handles retries, dead-letter queues, and priority scheduling.

Reliability Patterns

Distributed systems fail in partial, unpredictable ways. These patterns contain failures and prevent them from cascading.

Request flow with resilience patterns

Client

→

Rate Limiter

→

Circuit Breaker

→

Bulkhead

→

Service

Pattern	What It Does	Analogy
Circuit Breaker	Stops calling a failing service after N errors. Moves through Closed → Open → Half-Open states. Prevents wasting resources on a dead dependency.	Electrical circuit breaker trips to prevent fire
Bulkhead	Isolates resources (thread pools, connection pools) per dependency so one slow service can't exhaust all capacity.	Ship compartments prevent total flooding
Retry + Backoff + Jitter	Retry failed requests with exponential backoff (1s, 2s, 4s...) plus random jitter to avoid thundering herd.	Politely knocking again, waiting longer each time

Rate Limiting Algorithms

Algorithm	How It Works	Pros / Cons
Token Bucket	Tokens added at fixed rate; each request consumes a token. Requests rejected when bucket is empty.	Allows bursts up to bucket size. Simple. Widely used (e.g., API Gateway).
Sliding Window Log	Stores timestamp of each request. Counts requests in the last N seconds.	Precise, but memory-heavy for high-volume endpoints.
Sliding Window Counter	Hybrid: uses fixed window counts + weighted overlap for the current sliding window.	Memory-efficient approximation. Good balance of accuracy and performance.

Retry Formula

delay = min(base * 2^attempt + random(0, base), max_delay) — Exponential backoff with jitter. The jitter is critical: without it, all clients retry at the same instant, recreating the overload.

Test Yourself

Score: 0 / 10

Question 01

A system has 100M daily active users, each making 10 requests per day. What is the approximate QPS?

100M x 10 / 86,400 seconds = ~11,574 QPS. Always divide total daily requests by 86,400 (seconds in a day) for average QPS. Peak QPS is typically 2-5x the average.

Question 02

What is the primary prerequisite for horizontal scaling of application servers?

Stateless services allow any instance to handle any request because no session data is stored locally. State is pushed to external stores (Redis, DB, JWT), enabling the load balancer to route requests to any available instance.

Question 03

In an active-active HA setup, what is the main challenge compared to active-passive?

Active-active has all nodes serving traffic and potentially writing data simultaneously, which creates the possibility of conflicting writes. Conflict resolution strategies (last-write-wins, CRDTs, application-level merging) add complexity.

Question 04

Why is cursor-based pagination preferred over offset-based for real-time feeds?

Offset pagination breaks when items are inserted or deleted between page fetches -- you may see duplicates or skip items. Cursor pagination uses a stable reference point (e.g., the last seen ID), so new inserts don't shift the window.

Question 05

What is the purpose of an idempotency key in API design?

Idempotency keys let the server detect duplicate requests (e.g., a client retrying after a timeout). The server stores the result keyed by the idempotency key and returns the cached response for duplicates, preventing double charges, duplicate records, etc.

Question 06

According to DDD, how should microservice boundaries be drawn?

DDD's bounded contexts group related domain concepts that share a consistent model and vocabulary. Each bounded context maps naturally to a microservice that owns its data and business rules, aligned with a business capability rather than a technical layer.

Question 07

A circuit breaker in the "Open" state will:

When a circuit breaker is Open, it short-circuits all calls and fails fast, preventing the caller from wasting resources on a dependency that is known to be failing. After a timeout, it transitions to Half-Open and lets a few probe requests through to test recovery.

Question 08

Why is jitter added to exponential backoff in retry logic?

Without jitter, all clients that failed at the same time will retry at the same time (e.g., exactly 1s, 2s, 4s later), recreating the same overload spike. Random jitter spreads retries across time, giving the recovering service a chance to process them gradually.

Question 09

What does the bulkhead pattern isolate?

The bulkhead pattern assigns separate resource pools (thread pools, connection pools, semaphores) to each downstream dependency. If one service becomes slow and exhausts its pool, other services remain unaffected because they have their own isolated pools.

Question 10

In the token bucket rate limiting algorithm, what happens when the bucket is empty?

When no tokens are available, the request is rejected with a 429 Too Many Requests response. Tokens are replenished at a fixed rate. The bucket size determines the maximum burst allowed -- a full bucket lets that many requests through immediately before rate limiting kicks in.