INFRASTRUCTURE

Message Queues & Event Systems

How distributed systems communicate asynchronously: from fundamental queue semantics and delivery guarantees through Kafka, RabbitMQ, cloud-managed services, and event-driven architecture patterns like event sourcing and CQRS.

01 / Fundamentals

Queue Semantics & Delivery Guarantees

A message queue decouples producers from consumers. The producer writes a message to a queue or topic, and one or more consumers read it asynchronously. This decoupling lets each side scale independently and absorb traffic spikes without data loss.

Basic message flow
Producer
Queue / Topic
Consumer

Point-to-Point vs Pub-Sub

In a point-to-point (P2P) model, each message is delivered to exactly one consumer. A work queue with competing consumers is the classic example: multiple workers pull from the same queue, but each message is processed once. In publish-subscribe (pub-sub), each message is broadcast to all subscribers. Every subscriber gets its own copy, enabling fan-out scenarios like sending the same order event to billing, inventory, and notifications simultaneously.

PropertyPoint-to-PointPub-Sub
DeliveryOne consumer per messageAll subscribers get a copy
Use caseTask distribution, work queuesEvent broadcasting, fan-out
ScalingAdd more competing consumersEach subscriber scales independently
ExampleSQS, RabbitMQ work queueSNS, Kafka topics, RabbitMQ fanout exchange

Delivery Guarantees

At-most-once

Fire and forget. Message may be lost but is never delivered twice. Fastest, used when occasional loss is acceptable (metrics, logs).

At-least-once

Message is retried until acknowledged. May produce duplicates. Consumer must be idempotent. Most common guarantee.

Exactly-once

Each message is processed exactly once. Requires coordination between broker and consumer (transactions, deduplication). Expensive. Kafka supports this via idempotent producers + transactional consumers.

Key Insight
"Exactly-once" in distributed systems usually means "effectively-once" -- the broker may deliver more than once, but the consumer deduplicates using idempotency keys or transactional commits, making it appear as exactly-once to the application.
02 / Apache Kafka

The Distributed Commit Log

Kafka models messaging as an immutable, append-only commit log. Messages are not deleted after consumption -- they are retained based on time or size policies. This fundamental design difference from traditional queues enables replay, multiple consumer groups reading at their own pace, and stream processing.

Kafka architecture
Producer
Topic (Partitions)
Consumer Group
Partition 0
Partition 1
Partition 2

Topics, Partitions & Offsets

A topic is a named feed of messages. Each topic is split into partitions -- ordered, immutable sequences of records. Each record within a partition gets a sequential offset. Producers choose a partition via a key hash (or round-robin). Partitions are the unit of parallelism: more partitions = more consumers working in parallel.

Consumer Groups

A consumer group is a set of consumers that cooperatively read from a topic. Each partition is assigned to exactly one consumer in the group (but one consumer may read multiple partitions). Different consumer groups read independently, each maintaining their own offsets. This gives you pub-sub across groups and P2P within a group.

In-Sync Replicas (ISR) & Acks

Each partition has a leader and zero or more follower replicas. The ISR set contains replicas that are caught up with the leader. The acks setting controls durability:

acksBehaviorDurabilityLatency
0Don't wait for any acknowledgmentLowest -- messages can be lostFastest
1Wait for leader to writeMedium -- lost if leader dies before replicationLow
allWait for all ISR replicas to writeHighest -- survives broker failuresHighest

Retention & Compaction

Time/size-based retention deletes old segments after a configurable period (e.g., 7 days) or when the log exceeds a size limit. Log compaction keeps only the latest value for each key, effectively turning the topic into a changelog table. Compacted topics are ideal for maintaining state (e.g., user profiles, config) where you always want the latest value but don't need the full history.

When to use Kafka
High-throughput event streaming, log aggregation, change data capture (CDC), stream processing pipelines, and any scenario where you need message replay or multiple independent consumers reading the same data.
03 / RabbitMQ

AMQP & Exchange-Based Routing

RabbitMQ implements the AMQP (Advanced Message Queuing Protocol) standard. Unlike Kafka's log-based model, RabbitMQ is a traditional message broker: messages are routed through exchanges, delivered to queues, and deleted once acknowledged. It excels at complex routing, priority queues, and request-reply patterns.

RabbitMQ message flow
Producer
Exchange
Binding
Queue
Consumer

Exchange Types

Direct

Routes messages to queues whose binding key exactly matches the message's routing key. Classic P2P pattern.

Fanout

Broadcasts every message to all bound queues, ignoring routing keys. Pure pub-sub.

Topic

Routes based on wildcard pattern matching on the routing key. order.* matches order.created, # matches everything.

Headers

Routes based on message header attributes instead of routing keys. Rarely used but flexible for complex matching.

Queues & Bindings

A binding connects an exchange to a queue with an optional routing key or pattern. Messages flow: producer → exchange → matching bindings → queues. Multiple queues can bind to the same exchange, and a queue can bind to multiple exchanges.

Acknowledgment Modes

RabbitMQ supports manual ack (consumer explicitly acknowledges after processing -- ensures at-least-once delivery), auto ack (message is considered delivered as soon as it's sent to the consumer -- at-most-once), and nack/reject with requeue to retry failed messages. Manual ack with prefetch count controls back-pressure.

Pitfall
Setting auto-ack with no prefetch limit can overwhelm consumers. Always use manual ack in production and set a reasonable prefetch_count (e.g., 10-50) to control how many unacked messages a consumer holds.
04 / Cloud-Managed Services

SQS, SNS & Google Pub/Sub

Managed message services eliminate the operational burden of running brokers. They handle scaling, replication, and availability automatically, trading some flexibility for reduced ops work.

Amazon SQS

A fully managed pull-based queue. Messages are polled by consumers and become invisible for a visibility timeout while being processed. If the consumer doesn't delete the message in time, it becomes visible again for retry.

PropertyStandard QueueFIFO Queue
OrderingBest-effort (mostly ordered)Strict FIFO within message group
DeliveryAt-least-once (occasional duplicates)Exactly-once processing
ThroughputNearly unlimited300 msgs/sec (3000 with batching)
DeduplicationNone5-minute dedup window

Amazon SNS

A push-based pub-sub service. Publishers send messages to a topic, and SNS pushes them to all subscribers (SQS queues, Lambda functions, HTTP endpoints, email, SMS). The classic pattern is SNS + SQS fan-out: SNS topic pushes to multiple SQS queues, each consumed independently.

SNS + SQS fan-out pattern
Publisher
SNS Topic
SQS Queue A
SQS Queue B

Google Cloud Pub/Sub

A global, fully managed pub-sub system. Messages are published to topics and delivered to subscriptions. Each subscription is an independent consumer -- similar to Kafka consumer groups. Supports push (HTTP webhook) and pull delivery, at-least-once delivery with exactly-once processing via ack deadlines and ordering keys.

Choosing a managed service
Use SQS for simple work queues, SNS+SQS for fan-out, and Google Pub/Sub or Amazon MSK (managed Kafka) when you need replay, ordering, or global distribution. All managed services trade fine-grained control for operational simplicity.
05 / Event-Driven Patterns

Event Notification & State Transfer

Event-driven architecture uses events as the primary communication mechanism between services. Two core patterns define how much data travels with the event.

Event Notification

The event carries minimal data -- just enough to say "something happened." For example: {"event": "order.placed", "orderId": "abc123"}. The consumer must call back to the source service to fetch details. This keeps events small and the source authoritative, but creates runtime coupling (the callback must succeed).

Event-Carried State Transfer

The event carries the full state needed by consumers: {"event": "order.placed", "orderId": "abc123", "items": [...], "total": 59.99, "customer": {...}}. Consumers can process independently without callbacks. This eliminates runtime coupling but increases event size and raises the question of data staleness.

AspectEvent NotificationEvent-Carried State Transfer
Event sizeSmall (IDs only)Large (full payload)
CouplingRuntime coupling (callback needed)No runtime coupling
ConsistencyAlways fresh (reads from source)Eventually consistent (snapshot in event)
Consumer autonomyLow -- depends on source availabilityHigh -- fully self-contained
Practical rule
Prefer event-carried state transfer when consumer autonomy matters (microservices, different team boundaries). Use event notification when events are frequent, payloads would be large, and consumers always need the latest state anyway.
06 / Event Sourcing & CQRS

Storing Events as the Source of Truth

Event sourcing flips the traditional model: instead of storing current state and losing history, you store every state change as an immutable event in an append-only event store. Current state is derived by replaying events.

Event sourcing flow
Command
Validate & Emit Event
Event Store (append)
Replay Events
Projection (Read Model)

Projections & Snapshots

Projections are read-optimized views built by replaying events. A "current account balance" projection replays all deposit/withdrawal events. As event counts grow, replaying from the start becomes slow, so snapshots periodically capture the aggregate state. Rebuilding then starts from the latest snapshot plus subsequent events.

CQRS (Command Query Responsibility Segregation)

CQRS separates the write model (handles commands, enforces business rules, emits events) from the read model (projections optimized for queries). The write side uses the event store; the read side uses denormalized views in whatever database suits the query pattern. This separation lets each side scale and optimize independently.

CQRS architecture
Command
Write Model
Event Store
Read Model (Projection)
Query

Saga Pattern

In distributed systems, a saga coordinates a multi-step transaction across services using a sequence of local transactions and compensating events. If step 3 fails, the saga emits compensating events to undo steps 1 and 2. Two flavors exist: choreography (each service listens and reacts -- simpler but harder to trace) and orchestration (a central coordinator directs the flow -- easier to reason about but adds a single point of logic).

Complexity warning
Event sourcing and CQRS add significant complexity. They shine when you need a complete audit trail, temporal queries ("what was the state at time T?"), or complex domain logic. For simple CRUD applications, they are overkill.

Test Yourself

Score: 0 / 10
Question 01
In a point-to-point messaging model, how many consumers process each message?
In P2P (competing consumers), each message is delivered to and processed by exactly one consumer. This contrasts with pub-sub where all subscribers receive a copy.
Question 02
What does Kafka's acks=all setting guarantee?
acks=all means the producer waits until all replicas in the ISR (in-sync replica set) have written the message. It does not mean every broker -- only the replicas assigned to that partition.
Question 03
In Kafka, what is the unit of parallelism within a consumer group?
Each partition is assigned to exactly one consumer within a group. You cannot have more active consumers than partitions in a group -- extra consumers sit idle. More partitions = more parallelism.
Question 04
Which RabbitMQ exchange type broadcasts messages to all bound queues regardless of routing key?
A fanout exchange ignores routing keys entirely and delivers a copy of every message to every bound queue -- pure broadcast/pub-sub behavior.
Question 05
What happens to an SQS message if a consumer fails to delete it within the visibility timeout?
When the visibility timeout expires without the consumer deleting the message, SQS makes it visible again in the queue. After exceeding the max receive count, it moves to the dead-letter queue (if configured).
Question 06
What is the key difference between "event notification" and "event-carried state transfer"?
Event notification says "something happened" with minimal data (forcing a callback to the source), while event-carried state transfer includes the full payload so consumers can act independently without calling back.
Question 07
In event sourcing, what is a "projection"?
A projection is a derived read model built by processing/replaying events from the event store. It is optimized for specific query patterns (e.g., "current balance", "order history") and can be rebuilt at any time from the events.
Question 08
What does Kafka's log compaction do?
Log compaction keeps the most recent record for each message key and discards older records with the same key. This turns the topic into a snapshot of latest values -- ideal for changelogs and state replication.
Question 09
In the saga pattern, what happens when one step in a distributed transaction fails?
Sagas don't have atomic rollback across services. Instead, each completed step has a compensating action. When a step fails, the saga triggers compensating events to semantically undo the effects of previous steps.
Question 10
Why does CQRS separate the write model from the read model?
CQRS separates reads and writes so the write model can enforce complex business rules and domain logic while the read model uses denormalized views optimized for fast queries. Each side scales independently based on its own load patterns.