The Caching Hierarchy
Every system has multiple caching layers stacked on top of each other. A request traverses these layers from the fastest and smallest to the slowest and largest. Understanding where each layer sits helps you decide which one to optimize.
CPU Cache
L1 (~1ns), L2 (~4ns), L3 (~10ns). Hardware-managed. You influence it through data locality — keep hot data contiguous in memory so cache lines stay useful. False sharing across cores is a common pitfall in multi-threaded code.
OS Page Cache
The kernel caches recently read disk pages in free RAM. This is why the second cat of a large file is instant. Tools like vmtouch let you inspect what's cached. Database engines like PostgreSQL rely heavily on this layer.
Application-level Cache
In-process caches like Guava, Caffeine (JVM), or a Python dictionary. Zero network overhead, but limited to a single process and lost on restart. Best for reference data that changes rarely.
Distributed Cache
Redis or Memcached — shared across all app instances. Adds a network hop (~0.5ms) but provides a unified view. This is the workhorse cache in most web architectures.
CDN Edge Cache
Cloudflare, CloudFront, Fastly — caches HTTP responses at PoPs close to users. Eliminates origin round-trips for static and semi-static content. Latency drops from 200ms to 20ms.
Browser Cache
Controlled via HTTP headers (Cache-Control, ETag). The fastest cache — no network at all. But hard to invalidate once set.
DB Buffer Pool
InnoDB's buffer pool, PostgreSQL's shared buffers. The database caches frequently accessed pages in its own managed memory, reducing disk I/O. Tuning buffer pool size is one of the highest-impact database optimizations.
Read and Write Patterns
How your application interacts with the cache determines consistency, latency, and complexity tradeoffs. Five core patterns cover the majority of use cases.
| Pattern | Read Path | Write Path | Consistency | Best For |
|---|---|---|---|---|
| Cache-Aside | App checks cache, on miss reads DB, fills cache | App writes DB, invalidates cache | Eventual | General purpose, most common |
| Read-Through | Cache itself loads from DB on miss | Same as cache-aside | Eventual | Simpler app code, cache-as-library |
| Write-Through | From cache | App writes cache, cache synchronously writes DB | Strong | When consistency matters more than write latency |
| Write-Behind | From cache | App writes cache, cache async writes DB | Eventual (risk of loss) | High write throughput, acceptable loss risk |
| Refresh-Ahead | Cache proactively refreshes before TTL expires | N/A | Eventual | Predictable access patterns, low-latency reads |
Cache-Aside (Lazy Loading)
The most widely used pattern. The application is responsible for all cache interactions. On a cache miss, the app fetches from the database and populates the cache. On writes, the app updates the database first, then invalidates (not updates) the cache entry.
def get_user(user_id):
user = cache.get(f"user:{user_id}")
if user is None:
user = db.query("SELECT * FROM users WHERE id = %s", user_id)
cache.set(f"user:{user_id}", user, ttl=300)
return user
def update_user(user_id, data):
db.execute("UPDATE users SET ... WHERE id = %s", user_id)
cache.delete(f"user:{user_id}") # invalidate, don't update
Write-Behind (Write-Back)
The cache buffers writes and flushes them to the database asynchronously in batches. This dramatically improves write throughput but introduces a durability risk — if the cache node crashes before flushing, those writes are lost. Used in systems like CPU caches, OS page cache dirty writeback, and some message queue patterns.
Refresh-Ahead
The cache predicts which keys will be needed soon and refreshes them before they expire. This avoids the latency spike of a cache miss. Works well when access patterns are predictable (e.g., a product page refreshed every 30s). Poorly predicted refreshes waste resources.
When the Cache is Full
Every cache has a size limit. When it fills up, the eviction policy decides which entries to drop. The right policy depends on your access pattern.
Least Recently Used. Evicts the entry that hasn't been accessed for the longest time. Good general-purpose policy. Used by Redis (allkeys-lru).
Least Frequently Used. Evicts the entry with the fewest accesses. Better for workloads with stable hot keys, but slow to adapt to access pattern changes.
First In, First Out. Evicts the oldest entry regardless of access pattern. Simple but naive — ignores popularity entirely. Rarely used in production caches.
Entries expire after a fixed time-to-live. Not strictly eviction but controls cache freshness. Often combined with LRU/LFU for memory pressure eviction.
Used by Caffeine (Java). Combines a small admission window (LRU) with a main space (segmented LRU), using a frequency sketch to decide admission. Near-optimal hit rates with low overhead.
Evicts a random entry. Surprisingly effective for uniform access patterns. Redis supports allkeys-random. Zero overhead for tracking access metadata.
Redis Eviction Modes
Redis offers 8 eviction policies configured via maxmemory-policy: noeviction (return errors), allkeys-lru, volatile-lru (only keys with TTL), allkeys-lfu, volatile-lfu, allkeys-random, volatile-random, and volatile-ttl (evict keys with shortest remaining TTL). The volatile-* variants only consider keys that have an expiration set.
The Hard Problem
"There are only two hard things in computer science: cache invalidation and naming things." Invalidation is hard because you must ensure the cache reflects the source of truth without sacrificing the performance gains that caching provides.
TTL-based Invalidation
The simplest approach. Every cache entry gets a time-to-live. After expiry, the next read triggers a fresh fetch. Staleness is bounded by the TTL value. Short TTLs reduce staleness but increase origin load. A 60s TTL means data can be up to 60 seconds stale.
Event-driven Invalidation
When data changes, the writer publishes an event (via Kafka, Redis Pub/Sub, or a DB trigger) and cache subscribers invalidate or update the relevant keys. Provides near-real-time consistency but adds infrastructure complexity.
# Publisher (on write)
def update_product(product_id, data):
db.update(product_id, data)
event_bus.publish("product.updated", {"id": product_id})
# Subscriber (cache invalidator)
@event_bus.on("product.updated")
def handle_product_update(event):
cache.delete(f"product:{event['id']}")
Version-based Invalidation
Cache keys include a version number or hash. When data changes, the version increments, and old keys are naturally never read again. The stale entries eventually get evicted by memory pressure. Common in CDN cache-busting: app.v3.js or app.abc123.js.
| Strategy | Staleness | Complexity | Origin Load |
|---|---|---|---|
| TTL-based | Bounded by TTL | Low | Moderate (on expiry) |
| Event-driven | Near-zero | High | Low (only on miss) |
| Version-based | Zero (new key) | Medium | Low (orphaned keys waste space) |
When Caching Goes Wrong
Caching introduces its own class of failure modes. Understanding these patterns — and their mitigations — is essential for building resilient systems.
Cache Stampede (Thundering Herd)
A popular key expires, and hundreds of concurrent requests all miss the cache simultaneously, hammering the database with identical queries. The mitigation toolkit:
Only one request fetches from DB; others wait for the cache to be repopulated. Use a distributed lock (e.g., Redis SET NX EX).
Deduplicate in-flight requests for the same key. The first request does the work; others receive the same result. Libraries like singleflight (Go) implement this.
Add random variance to TTL values so keys don't all expire at the same instant. E.g., TTL = 300 + random(0, 60).
Cache Penetration
Requests for data that doesn't exist bypass the cache every time (cache miss, DB miss, nothing to cache). Attackers can exploit this to overload your database. Mitigations:
Cache negative results: Store a null/empty sentinel value with a short TTL for keys that don't exist in the DB.
Bloom filter: Check a Bloom filter before querying. If the filter says "not present," skip the DB entirely. False positives are acceptable (they just cause a normal DB miss).
Cache Avalanche
A large number of keys expire at the same time (e.g., all populated during the same cache warmup), causing a sudden spike in DB load. Mitigation: stagger TTLs with random jitter, and implement circuit breakers to degrade gracefully.
Hot Key Problem
A single extremely popular key receives disproportionate traffic, overloading the cache node that owns it. Solutions: replicate the hot key across multiple nodes with slight key name variations, use local in-process caching as a first layer, or split the value across sub-keys.
Edge Caching and Distributed Caching
CDN Edge Caching
CDNs cache content at Points of Presence (PoPs) worldwide. The key mechanisms:
Cache-Control headers govern CDN behavior. Cache-Control: public, max-age=86400 tells the CDN to cache for 24 hours. s-maxage overrides max-age for shared caches (CDNs) specifically. stale-while-revalidate serves stale content while fetching fresh data in the background.
# Common Cache-Control patterns
Cache-Control: public, max-age=31536000, immutable # versioned assets (app.3f2a.js)
Cache-Control: public, s-maxage=60, stale-while-revalidate=300 # API responses
Cache-Control: private, no-cache # user-specific data
Cache-Control: no-store # sensitive data, never cache
Cache busting: Append a hash or version to filenames (style.a1b2c3.css) so updated assets get a new URL and bypass stale CDN caches. Build tools like Webpack/Vite do this automatically.
Origin shield: An intermediate CDN layer between edge PoPs and your origin. Multiple edge nodes fetch from the shield instead of hitting the origin directly, reducing origin load during cache misses.
Redis in Production
Redis is the dominant distributed cache. Key considerations:
| Feature | Details |
|---|---|
| Data Structures | Strings, Hashes, Lists, Sets, Sorted Sets, Streams, HyperLogLog. Use the right structure — Hashes for objects, Sorted Sets for leaderboards, Streams for event logs. |
| Persistence | RDB (periodic snapshots) and AOF (append-only log). RDB is faster for restarts; AOF is more durable. Use both in production for safety. |
| Sentinel | Automatic failover for master-replica setups. Monitors master health, promotes a replica on failure, and reconfigures clients. Minimum 3 Sentinel instances for quorum. |
| Cluster | Shards data across multiple nodes using 16,384 hash slots. Provides horizontal scaling beyond a single node's memory. Adds complexity — some multi-key operations require all keys on the same shard (use hash tags). |
Common Redis Patterns
# Distributed lock (for cache stampede prevention)
SET lock:product:42 owner_id NX EX 10 # acquire lock, 10s timeout
# ... fetch from DB, populate cache ...
DEL lock:product:42 # release lock
# Rate limiting with sliding window
ZADD rate:user:123 NOW NOW # add timestamp as score and member
ZREMRANGEBYSCORE rate:user:123 0 (NOW-60) # remove entries older than 60s
ZCARD rate:user:123 # count requests in window
# Leaderboard
ZADD leaderboard 1500 "player:alice"
ZADD leaderboard 2100 "player:bob"
ZREVRANGE leaderboard 0 9 WITHSCORES # top 10
Test Yourself
stale-while-revalidate in a Cache-Control header do?stale-while-revalidate allows the cache to immediately serve a stale response while asynchronously fetching a fresh one from the origin. The user gets instant response times, and the cache stays fresh for the next request. It's a way to get both low latency and freshness.appendfsync always), providing near-zero data loss. RDB takes periodic snapshots, so you can lose all writes since the last snapshot. In practice, production setups use both: AOF for durability and RDB for fast restarts.