Message Queues & Event Systems

01 / Fundamentals

Queue Semantics & Delivery Guarantees

A message queue decouples producers from consumers. The producer writes a message to a queue or topic, and one or more consumers read it asynchronously. This decoupling lets each side scale independently and absorb traffic spikes without data loss.

Basic message flow

Producer

→

Queue / Topic

→

Consumer

Point-to-Point vs Pub-Sub

In a point-to-point (P2P) model, each message is delivered to exactly one consumer. A work queue with competing consumers is the classic example: multiple workers pull from the same queue, but each message is processed once. In publish-subscribe (pub-sub), each message is broadcast to all subscribers. Every subscriber gets its own copy, enabling fan-out scenarios like sending the same order event to billing, inventory, and notifications simultaneously.

Property	Point-to-Point	Pub-Sub
Delivery	One consumer per message	All subscribers get a copy
Use case	Task distribution, work queues	Event broadcasting, fan-out
Scaling	Add more competing consumers	Each subscriber scales independently
Example	SQS, RabbitMQ work queue	SNS, Kafka topics, RabbitMQ fanout exchange

Delivery Guarantees

At-most-once

Fire and forget. Message may be lost but is never delivered twice. Fastest, used when occasional loss is acceptable (metrics, logs).

At-least-once

Message is retried until acknowledged. May produce duplicates. Consumer must be idempotent. Most common guarantee.

Exactly-once

Each message is processed exactly once. Requires coordination between broker and consumer (transactions, deduplication). Expensive. Kafka supports this via idempotent producers + transactional consumers.

Key Insight

"Exactly-once" in distributed systems usually means "effectively-once" -- the broker may deliver more than once, but the consumer deduplicates using idempotency keys or transactional commits, making it appear as exactly-once to the application.

02 / Apache Kafka

The Distributed Commit Log

Kafka models messaging as an immutable, append-only commit log. Messages are not deleted after consumption -- they are retained based on time or size policies. This fundamental design difference from traditional queues enables replay, multiple consumer groups reading at their own pace, and stream processing.

Kafka architecture

Producer

→

Topic (Partitions)

→

Consumer Group

Partition 0

Partition 1

Partition 2

Topics, Partitions & Offsets

A topic is a named feed of messages. Each topic is split into partitions -- ordered, immutable sequences of records. Each record within a partition gets a sequential offset. Producers choose a partition via a key hash (or round-robin). Partitions are the unit of parallelism: more partitions = more consumers working in parallel.

Consumer Groups

A consumer group is a set of consumers that cooperatively read from a topic. Each partition is assigned to exactly one consumer in the group (but one consumer may read multiple partitions). Different consumer groups read independently, each maintaining their own offsets. This gives you pub-sub across groups and P2P within a group.

In-Sync Replicas (ISR) & Acks

Each partition has a leader and zero or more follower replicas. The ISR set contains replicas that are caught up with the leader. The acks setting controls durability:

acks	Behavior	Durability	Latency
`0`	Don't wait for any acknowledgment	Lowest -- messages can be lost	Fastest
`1`	Wait for leader to write	Medium -- lost if leader dies before replication	Low
`all`	Wait for all ISR replicas to write	Highest -- survives broker failures	Highest

Retention & Compaction

Time/size-based retention deletes old segments after a configurable period (e.g., 7 days) or when the log exceeds a size limit. Log compaction keeps only the latest value for each key, effectively turning the topic into a changelog table. Compacted topics are ideal for maintaining state (e.g., user profiles, config) where you always want the latest value but don't need the full history.

When to use Kafka

High-throughput event streaming, log aggregation, change data capture (CDC), stream processing pipelines, and any scenario where you need message replay or multiple independent consumers reading the same data.

03 / RabbitMQ

AMQP & Exchange-Based Routing

RabbitMQ implements the AMQP (Advanced Message Queuing Protocol) standard. Unlike Kafka's log-based model, RabbitMQ is a traditional message broker: messages are routed through exchanges, delivered to queues, and deleted once acknowledged. It excels at complex routing, priority queues, and request-reply patterns.

RabbitMQ message flow

Producer

→

Exchange

→

Binding

→

Queue

→

Consumer

Exchange Types

Direct

Routes messages to queues whose binding key exactly matches the message's routing key. Classic P2P pattern.

Fanout

Broadcasts every message to all bound queues, ignoring routing keys. Pure pub-sub.

Topic

Routes based on wildcard pattern matching on the routing key. order.* matches order.created, # matches everything.

Headers

Routes based on message header attributes instead of routing keys. Rarely used but flexible for complex matching.

Queues & Bindings

A binding connects an exchange to a queue with an optional routing key or pattern. Messages flow: producer → exchange → matching bindings → queues. Multiple queues can bind to the same exchange, and a queue can bind to multiple exchanges.

Acknowledgment Modes

RabbitMQ supports manual ack (consumer explicitly acknowledges after processing -- ensures at-least-once delivery), auto ack (message is considered delivered as soon as it's sent to the consumer -- at-most-once), and nack/reject with requeue to retry failed messages. Manual ack with prefetch count controls back-pressure.

Pitfall

Setting auto-ack with no prefetch limit can overwhelm consumers. Always use manual ack in production and set a reasonable prefetch_count (e.g., 10-50) to control how many unacked messages a consumer holds.

04 / Cloud-Managed Services

SQS, SNS & Google Pub/Sub

Managed message services eliminate the operational burden of running brokers. They handle scaling, replication, and availability automatically, trading some flexibility for reduced ops work.

Amazon SQS

A fully managed pull-based queue. Messages are polled by consumers and become invisible for a visibility timeout while being processed. If the consumer doesn't delete the message in time, it becomes visible again for retry.

Property	Standard Queue	FIFO Queue
Ordering	Best-effort (mostly ordered)	Strict FIFO within message group
Delivery	At-least-once (occasional duplicates)	Exactly-once processing
Throughput	Nearly unlimited	300 msgs/sec (3000 with batching)
Deduplication	None	5-minute dedup window

Amazon SNS

A push-based pub-sub service. Publishers send messages to a topic, and SNS pushes them to all subscribers (SQS queues, Lambda functions, HTTP endpoints, email, SMS). The classic pattern is SNS + SQS fan-out: SNS topic pushes to multiple SQS queues, each consumed independently.

SNS + SQS fan-out pattern

Publisher

→

SNS Topic

→

SQS Queue A

Publisher

→

SNS Topic

→

SQS Queue B

Google Cloud Pub/Sub

A global, fully managed pub-sub system. Messages are published to topics and delivered to subscriptions. Each subscription is an independent consumer -- similar to Kafka consumer groups. Supports push (HTTP webhook) and pull delivery, at-least-once delivery with exactly-once processing via ack deadlines and ordering keys.

Choosing a managed service

Use SQS for simple work queues, SNS+SQS for fan-out, and Google Pub/Sub or Amazon MSK (managed Kafka) when you need replay, ordering, or global distribution. All managed services trade fine-grained control for operational simplicity.

05 / Event-Driven Patterns

Event Notification & State Transfer

Event-driven architecture uses events as the primary communication mechanism between services. Two core patterns define how much data travels with the event.

Event Notification

The event carries minimal data -- just enough to say "something happened." For example: {"event": "order.placed", "orderId": "abc123"}. The consumer must call back to the source service to fetch details. This keeps events small and the source authoritative, but creates runtime coupling (the callback must succeed).

Event-Carried State Transfer

The event carries the full state needed by consumers: {"event": "order.placed", "orderId": "abc123", "items": [...], "total": 59.99, "customer": {...}}. Consumers can process independently without callbacks. This eliminates runtime coupling but increases event size and raises the question of data staleness.

Aspect	Event Notification	Event-Carried State Transfer
Event size	Small (IDs only)	Large (full payload)
Coupling	Runtime coupling (callback needed)	No runtime coupling
Consistency	Always fresh (reads from source)	Eventually consistent (snapshot in event)
Consumer autonomy	Low -- depends on source availability	High -- fully self-contained

Practical rule

Prefer event-carried state transfer when consumer autonomy matters (microservices, different team boundaries). Use event notification when events are frequent, payloads would be large, and consumers always need the latest state anyway.

06 / Event Sourcing & CQRS

Storing Events as the Source of Truth

Event sourcing flips the traditional model: instead of storing current state and losing history, you store every state change as an immutable event in an append-only event store. Current state is derived by replaying events.

Event sourcing flow

Command

→

Validate & Emit Event

→

Event Store (append)

Replay Events

→

Projection (Read Model)

Projections & Snapshots

Projections are read-optimized views built by replaying events. A "current account balance" projection replays all deposit/withdrawal events. As event counts grow, replaying from the start becomes slow, so snapshots periodically capture the aggregate state. Rebuilding then starts from the latest snapshot plus subsequent events.

CQRS (Command Query Responsibility Segregation)

CQRS separates the write model (handles commands, enforces business rules, emits events) from the read model (projections optimized for queries). The write side uses the event store; the read side uses denormalized views in whatever database suits the query pattern. This separation lets each side scale and optimize independently.

CQRS architecture

Command

→

Write Model

→

Event Store

Command

→

Write Model

→

Read Model (Projection)

→

Query

Saga Pattern

In distributed systems, a saga coordinates a multi-step transaction across services using a sequence of local transactions and compensating events. If step 3 fails, the saga emits compensating events to undo steps 1 and 2. Two flavors exist: choreography (each service listens and reacts -- simpler but harder to trace) and orchestration (a central coordinator directs the flow -- easier to reason about but adds a single point of logic).

Complexity warning

Event sourcing and CQRS add significant complexity. They shine when you need a complete audit trail, temporal queries ("what was the state at time T?"), or complex domain logic. For simple CRUD applications, they are overkill.

Test Yourself

Score: 0 / 10

Question 01

In a point-to-point messaging model, how many consumers process each message?

In P2P (competing consumers), each message is delivered to and processed by exactly one consumer. This contrasts with pub-sub where all subscribers receive a copy.

Question 02

What does Kafka's acks=all setting guarantee?

acks=all means the producer waits until all replicas in the ISR (in-sync replica set) have written the message. It does not mean every broker -- only the replicas assigned to that partition.

Question 03

In Kafka, what is the unit of parallelism within a consumer group?

Each partition is assigned to exactly one consumer within a group. You cannot have more active consumers than partitions in a group -- extra consumers sit idle. More partitions = more parallelism.

Question 04

Which RabbitMQ exchange type broadcasts messages to all bound queues regardless of routing key?

A fanout exchange ignores routing keys entirely and delivers a copy of every message to every bound queue -- pure broadcast/pub-sub behavior.

Question 05

What happens to an SQS message if a consumer fails to delete it within the visibility timeout?

When the visibility timeout expires without the consumer deleting the message, SQS makes it visible again in the queue. After exceeding the max receive count, it moves to the dead-letter queue (if configured).

Question 06

What is the key difference between "event notification" and "event-carried state transfer"?

Event notification says "something happened" with minimal data (forcing a callback to the source), while event-carried state transfer includes the full payload so consumers can act independently without calling back.

Question 07

In event sourcing, what is a "projection"?

A projection is a derived read model built by processing/replaying events from the event store. It is optimized for specific query patterns (e.g., "current balance", "order history") and can be rebuilt at any time from the events.

Question 08

What does Kafka's log compaction do?

Log compaction keeps the most recent record for each message key and discards older records with the same key. This turns the topic into a snapshot of latest values -- ideal for changelogs and state replication.

Question 09

In the saga pattern, what happens when one step in a distributed transaction fails?

Sagas don't have atomic rollback across services. Instead, each completed step has a compensating action. When a step fails, the saga triggers compensating events to semantically undo the effects of previous steps.

Question 10

Why does CQRS separate the write model from the read model?

CQRS separates reads and writes so the write model can enforce complex business rules and domain logic while the read model uses denormalized views optimized for fast queries. Each side scales independently based on its own load patterns.