Infrastructure

Caching Strategies

From CPU registers to CDN edges — understanding where, how, and when to cache data. Patterns for reading and writing, eviction policies, failure modes, and production-grade tools like Redis.

01 / Caching Layers

The Caching Hierarchy

Every system has multiple caching layers stacked on top of each other. A request traverses these layers from the fastest and smallest to the slowest and largest. Understanding where each layer sits helps you decide which one to optimize.

Request path through caching layers
CPU Cache (L1/L2/L3)
OS Page Cache
App-level Cache
Distributed Cache (Redis)
CDN Edge
Browser Cache
DB Buffer Pool

CPU Cache

L1 (~1ns), L2 (~4ns), L3 (~10ns). Hardware-managed. You influence it through data locality — keep hot data contiguous in memory so cache lines stay useful. False sharing across cores is a common pitfall in multi-threaded code.

OS Page Cache

The kernel caches recently read disk pages in free RAM. This is why the second cat of a large file is instant. Tools like vmtouch let you inspect what's cached. Database engines like PostgreSQL rely heavily on this layer.

Application-level Cache

In-process caches like Guava, Caffeine (JVM), or a Python dictionary. Zero network overhead, but limited to a single process and lost on restart. Best for reference data that changes rarely.

Distributed Cache

Redis or Memcached — shared across all app instances. Adds a network hop (~0.5ms) but provides a unified view. This is the workhorse cache in most web architectures.

CDN Edge Cache

Cloudflare, CloudFront, Fastly — caches HTTP responses at PoPs close to users. Eliminates origin round-trips for static and semi-static content. Latency drops from 200ms to 20ms.

Browser Cache

Controlled via HTTP headers (Cache-Control, ETag). The fastest cache — no network at all. But hard to invalidate once set.

DB Buffer Pool

InnoDB's buffer pool, PostgreSQL's shared buffers. The database caches frequently accessed pages in its own managed memory, reducing disk I/O. Tuning buffer pool size is one of the highest-impact database optimizations.

Key Insight
The fastest cache hit is the one closest to the requester. Design so that most requests never reach the origin — the CDN handles static assets, Redis handles dynamic lookups, and the DB buffer pool handles the rest.
02 / Caching Patterns

Read and Write Patterns

How your application interacts with the cache determines consistency, latency, and complexity tradeoffs. Five core patterns cover the majority of use cases.

PatternRead PathWrite PathConsistencyBest For
Cache-AsideApp checks cache, on miss reads DB, fills cacheApp writes DB, invalidates cacheEventualGeneral purpose, most common
Read-ThroughCache itself loads from DB on missSame as cache-asideEventualSimpler app code, cache-as-library
Write-ThroughFrom cacheApp writes cache, cache synchronously writes DBStrongWhen consistency matters more than write latency
Write-BehindFrom cacheApp writes cache, cache async writes DBEventual (risk of loss)High write throughput, acceptable loss risk
Refresh-AheadCache proactively refreshes before TTL expiresN/AEventualPredictable access patterns, low-latency reads

Cache-Aside (Lazy Loading)

The most widely used pattern. The application is responsible for all cache interactions. On a cache miss, the app fetches from the database and populates the cache. On writes, the app updates the database first, then invalidates (not updates) the cache entry.

def get_user(user_id):
    user = cache.get(f"user:{user_id}")
    if user is None:
        user = db.query("SELECT * FROM users WHERE id = %s", user_id)
        cache.set(f"user:{user_id}", user, ttl=300)
    return user

def update_user(user_id, data):
    db.execute("UPDATE users SET ... WHERE id = %s", user_id)
    cache.delete(f"user:{user_id}")  # invalidate, don't update
Why invalidate, not update?
Updating the cache on write creates a race condition: two concurrent writes can leave the cache holding stale data from the slower writer. Deleting the key forces the next read to fetch fresh data from the source of truth.

Write-Behind (Write-Back)

The cache buffers writes and flushes them to the database asynchronously in batches. This dramatically improves write throughput but introduces a durability risk — if the cache node crashes before flushing, those writes are lost. Used in systems like CPU caches, OS page cache dirty writeback, and some message queue patterns.

Refresh-Ahead

The cache predicts which keys will be needed soon and refreshes them before they expire. This avoids the latency spike of a cache miss. Works well when access patterns are predictable (e.g., a product page refreshed every 30s). Poorly predicted refreshes waste resources.

03 / Eviction Policies

When the Cache is Full

Every cache has a size limit. When it fills up, the eviction policy decides which entries to drop. The right policy depends on your access pattern.

LRU

Least Recently Used. Evicts the entry that hasn't been accessed for the longest time. Good general-purpose policy. Used by Redis (allkeys-lru).

LFU

Least Frequently Used. Evicts the entry with the fewest accesses. Better for workloads with stable hot keys, but slow to adapt to access pattern changes.

FIFO

First In, First Out. Evicts the oldest entry regardless of access pattern. Simple but naive — ignores popularity entirely. Rarely used in production caches.

TTL-based

Entries expire after a fixed time-to-live. Not strictly eviction but controls cache freshness. Often combined with LRU/LFU for memory pressure eviction.

W-TinyLFU

Used by Caffeine (Java). Combines a small admission window (LRU) with a main space (segmented LRU), using a frequency sketch to decide admission. Near-optimal hit rates with low overhead.

Random

Evicts a random entry. Surprisingly effective for uniform access patterns. Redis supports allkeys-random. Zero overhead for tracking access metadata.

Practical Guidance
Start with LRU — it works well for most web workloads. If you notice scan-resistance issues (a batch read evicts your hot data), consider W-TinyLFU or segmented LRU. Use LFU only when your hot set is truly stable.

Redis Eviction Modes

Redis offers 8 eviction policies configured via maxmemory-policy: noeviction (return errors), allkeys-lru, volatile-lru (only keys with TTL), allkeys-lfu, volatile-lfu, allkeys-random, volatile-random, and volatile-ttl (evict keys with shortest remaining TTL). The volatile-* variants only consider keys that have an expiration set.

04 / Cache Invalidation

The Hard Problem

"There are only two hard things in computer science: cache invalidation and naming things." Invalidation is hard because you must ensure the cache reflects the source of truth without sacrificing the performance gains that caching provides.

TTL-based Invalidation

The simplest approach. Every cache entry gets a time-to-live. After expiry, the next read triggers a fresh fetch. Staleness is bounded by the TTL value. Short TTLs reduce staleness but increase origin load. A 60s TTL means data can be up to 60 seconds stale.

Event-driven Invalidation

When data changes, the writer publishes an event (via Kafka, Redis Pub/Sub, or a DB trigger) and cache subscribers invalidate or update the relevant keys. Provides near-real-time consistency but adds infrastructure complexity.

# Publisher (on write)
def update_product(product_id, data):
    db.update(product_id, data)
    event_bus.publish("product.updated", {"id": product_id})

# Subscriber (cache invalidator)
@event_bus.on("product.updated")
def handle_product_update(event):
    cache.delete(f"product:{event['id']}")

Version-based Invalidation

Cache keys include a version number or hash. When data changes, the version increments, and old keys are naturally never read again. The stale entries eventually get evicted by memory pressure. Common in CDN cache-busting: app.v3.js or app.abc123.js.

StrategyStalenessComplexityOrigin Load
TTL-basedBounded by TTLLowModerate (on expiry)
Event-drivenNear-zeroHighLow (only on miss)
Version-basedZero (new key)MediumLow (orphaned keys waste space)
Key Insight
Most production systems combine strategies: TTL as a safety net (even event-driven caches use TTL as a fallback), event-driven for critical data, and version-based for static assets.
05 / Failure Patterns

When Caching Goes Wrong

Caching introduces its own class of failure modes. Understanding these patterns — and their mitigations — is essential for building resilient systems.

Cache Stampede (Thundering Herd)

A popular key expires, and hundreds of concurrent requests all miss the cache simultaneously, hammering the database with identical queries. The mitigation toolkit:

Locking

Only one request fetches from DB; others wait for the cache to be repopulated. Use a distributed lock (e.g., Redis SET NX EX).

Request Coalescing

Deduplicate in-flight requests for the same key. The first request does the work; others receive the same result. Libraries like singleflight (Go) implement this.

Jitter on TTL

Add random variance to TTL values so keys don't all expire at the same instant. E.g., TTL = 300 + random(0, 60).

Cache Penetration

Requests for data that doesn't exist bypass the cache every time (cache miss, DB miss, nothing to cache). Attackers can exploit this to overload your database. Mitigations:

Cache negative results: Store a null/empty sentinel value with a short TTL for keys that don't exist in the DB.

Bloom filter: Check a Bloom filter before querying. If the filter says "not present," skip the DB entirely. False positives are acceptable (they just cause a normal DB miss).

Cache Avalanche

A large number of keys expire at the same time (e.g., all populated during the same cache warmup), causing a sudden spike in DB load. Mitigation: stagger TTLs with random jitter, and implement circuit breakers to degrade gracefully.

Hot Key Problem

A single extremely popular key receives disproportionate traffic, overloading the cache node that owns it. Solutions: replicate the hot key across multiple nodes with slight key name variations, use local in-process caching as a first layer, or split the value across sub-keys.

Production Checklist
Before launching any caching layer: (1) add TTL jitter, (2) cache negative lookups, (3) set up monitoring for hit rate, miss rate, and eviction rate, (4) have a circuit breaker for cache failures so you can fall back to the DB without cascading.
06 / CDN and Redis in Production

Edge Caching and Distributed Caching

CDN Edge Caching

CDNs cache content at Points of Presence (PoPs) worldwide. The key mechanisms:

Cache-Control headers govern CDN behavior. Cache-Control: public, max-age=86400 tells the CDN to cache for 24 hours. s-maxage overrides max-age for shared caches (CDNs) specifically. stale-while-revalidate serves stale content while fetching fresh data in the background.

# Common Cache-Control patterns
Cache-Control: public, max-age=31536000, immutable     # versioned assets (app.3f2a.js)
Cache-Control: public, s-maxage=60, stale-while-revalidate=300  # API responses
Cache-Control: private, no-cache                        # user-specific data
Cache-Control: no-store                                 # sensitive data, never cache

Cache busting: Append a hash or version to filenames (style.a1b2c3.css) so updated assets get a new URL and bypass stale CDN caches. Build tools like Webpack/Vite do this automatically.

Origin shield: An intermediate CDN layer between edge PoPs and your origin. Multiple edge nodes fetch from the shield instead of hitting the origin directly, reducing origin load during cache misses.

Redis in Production

Redis is the dominant distributed cache. Key considerations:

FeatureDetails
Data StructuresStrings, Hashes, Lists, Sets, Sorted Sets, Streams, HyperLogLog. Use the right structure — Hashes for objects, Sorted Sets for leaderboards, Streams for event logs.
PersistenceRDB (periodic snapshots) and AOF (append-only log). RDB is faster for restarts; AOF is more durable. Use both in production for safety.
SentinelAutomatic failover for master-replica setups. Monitors master health, promotes a replica on failure, and reconfigures clients. Minimum 3 Sentinel instances for quorum.
ClusterShards data across multiple nodes using 16,384 hash slots. Provides horizontal scaling beyond a single node's memory. Adds complexity — some multi-key operations require all keys on the same shard (use hash tags).

Common Redis Patterns

# Distributed lock (for cache stampede prevention)
SET lock:product:42 owner_id NX EX 10  # acquire lock, 10s timeout
# ... fetch from DB, populate cache ...
DEL lock:product:42                     # release lock

# Rate limiting with sliding window
ZADD rate:user:123 NOW NOW             # add timestamp as score and member
ZREMRANGEBYSCORE rate:user:123 0 (NOW-60)  # remove entries older than 60s
ZCARD rate:user:123                     # count requests in window

# Leaderboard
ZADD leaderboard 1500 "player:alice"
ZADD leaderboard 2100 "player:bob"
ZREVRANGE leaderboard 0 9 WITHSCORES   # top 10
Redis vs Memcached
Use Redis when you need data structures beyond key-value, persistence, pub/sub, or Lua scripting. Use Memcached for pure key-value caching where simplicity and multi-threaded performance matter. Redis is the default choice for most teams today.

Test Yourself

Score: 0 / 10
Question 01
In a cache-aside pattern, what should the application do after updating the database?
Invalidating (deleting) the cache key avoids a race condition where two concurrent writes could leave the cache with stale data from the slower writer. The next read will populate the cache with fresh data from the database.
Question 02
Which eviction policy does Caffeine (Java) use to achieve near-optimal hit rates?
W-TinyLFU combines a small admission window (LRU) with a main space (segmented LRU) and uses a compact frequency sketch (Count-Min Sketch) to decide whether new entries should be admitted. This achieves near-optimal hit rates with low memory overhead.
Question 03
What is the primary risk of the write-behind (write-back) caching pattern?
Write-behind buffers writes in the cache and flushes them to the database asynchronously. If the cache node crashes before the flush completes, those buffered writes are lost permanently. This is the fundamental tradeoff for the higher write throughput it provides.
Question 04
How does a Bloom filter help prevent cache penetration?
A Bloom filter is a space-efficient probabilistic data structure that can tell you with certainty if an element is NOT in a set. When a request comes for a non-existent key, the Bloom filter catches it before it hits the database. False positives are possible (causing a normal DB miss) but false negatives are not.
Question 05
What does stale-while-revalidate in a Cache-Control header do?
stale-while-revalidate allows the cache to immediately serve a stale response while asynchronously fetching a fresh one from the origin. The user gets instant response times, and the cache stays fresh for the next request. It's a way to get both low latency and freshness.
Question 06
What is the primary mitigation for cache avalanche?
Cache avalanche occurs when many keys expire at the same time, causing a sudden load spike on the database. By adding random jitter to TTL values (e.g., base TTL + random offset), expirations are spread out over time, preventing the thundering herd effect.
Question 07
In Redis Cluster, how is data distributed across nodes?
Redis Cluster uses a fixed set of 16,384 hash slots. Each key is mapped to a slot via CRC16(key) mod 16384, and each node is responsible for a subset of slots. This is different from consistent hashing — it uses a fixed slot space that is explicitly assigned to nodes.
Question 08
What is an origin shield in a CDN architecture?
An origin shield is a designated CDN PoP that acts as a middle layer. Instead of every edge node independently fetching from the origin on a cache miss, they fetch from the shield. This collapses many edge misses into a single origin request, significantly reducing origin load.
Question 09
Which Redis persistence mode provides better durability?
AOF logs every write operation and can be configured to fsync on every command (appendfsync always), providing near-zero data loss. RDB takes periodic snapshots, so you can lose all writes since the last snapshot. In practice, production setups use both: AOF for durability and RDB for fast restarts.
Question 10
What distinguishes read-through caching from cache-aside?
In cache-aside, the application is responsible for checking the cache, fetching from the DB on a miss, and populating the cache. In read-through, this logic is encapsulated within the cache itself — the app simply asks the cache for data, and the cache transparently loads from the DB on a miss. This simplifies application code.