Docker & Containers — learn.surkar.in

01 / Containers from First Principles

What Is a Container?

A container is not a VM. It is a regular Linux process whose view of the system has been restricted by three kernel features working together: namespaces isolate what the process can see, cgroups limit what it can use, and a layered filesystem (like OverlayFS) gives it its own root directory without duplicating the entire OS.

Container building blocks

Namespaces

Cgroups

Layered FS

Container

Namespaces

Each namespace type isolates a different resource: pid (process tree), net (network stack), mnt (filesystem mounts), uts (hostname), ipc (inter-process communication), user (UID/GID mappings), and cgroup (cgroup root view). A container typically gets its own instance of all seven.

Cgroups (Control Groups)

Cgroups cap and account for CPU, memory, I/O, and network bandwidth. Without them a container could starve the host. cgroups v2 (unified hierarchy) is the modern default and what Docker uses on recent kernels.

Containers vs VMs

Property	Container	Virtual Machine
Isolation	Process-level (shared kernel)	Hardware-level (separate kernel)
Startup time	Milliseconds	Seconds to minutes
Image size	MBs (often < 100 MB)	GBs
Resource overhead	Near zero	Hypervisor + guest OS
Security boundary	Weaker (kernel shared)	Stronger (separate kernel)

OCI and Runtimes

The Open Container Initiative (OCI) defines two specs: the image spec (how layers and metadata are packaged) and the runtime spec (how a container is created from a root filesystem and config). runc is the reference low-level runtime. Higher-level runtimes like containerd and CRI-O manage image pulls, storage, and lifecycle on top of runc.

Runtime stack

docker CLI

→

dockerd

→

containerd

→

runc

→

Linux kernel

02 / Writing Dockerfiles

Dockerfile Instructions

A Dockerfile is a declarative recipe that turns a base image into your application image. Each instruction creates a new layer (or metadata entry) in the final image.

FROM

Sets the base image. Every Dockerfile starts here. Use specific tags: FROM node:20-slim.

RUN

Executes a command during build. Chain commands with && to reduce layers.

COPY / ADD

COPY copies files from build context. ADD also handles URLs and tar extraction -- prefer COPY for clarity.

WORKDIR

Sets the working directory for subsequent instructions. Creates the directory if it does not exist.

ENV / ARG

ENV sets runtime environment variables (persists in image). ARG is build-time only and disappears after build.

EXPOSE

Documents which ports the container listens on. Does not actually publish the port -- that requires -p at runtime.

CMD vs ENTRYPOINT

CMD provides default arguments that can be overridden at docker run. ENTRYPOINT sets the executable that always runs. When both are present, CMD supplies default arguments to ENTRYPOINT.

# ENTRYPOINT + CMD pattern
ENTRYPOINT ["python", "app.py"]
CMD ["--port", "8000"]

# docker run myimage              => python app.py --port 8000
# docker run myimage --port 9000  => python app.py --port 9000

Shell form vs Exec form

Always prefer exec form (["executable", "arg"]) over shell form (executable arg). Shell form wraps your command in /bin/sh -c, which means your process runs as PID 1's child and won't receive SIGTERM properly.

03 / Image Layers & Caching

Build Smart, Build Fast

Every RUN, COPY, and ADD instruction creates a new layer. Docker caches each layer and only rebuilds from the first instruction whose input changed. This means instruction order matters enormously.

Optimal instruction order

Put things that change rarely at the top (OS packages, language runtime) and things that change often at the bottom (your application code).

# Good: dependencies cached separately from code
FROM node:20-slim
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci --production
COPY . .
CMD ["node", "server.js"]

Why this works

Changing a source file only invalidates the final COPY . . layer. The npm ci layer stays cached because package.json did not change. This saves minutes on every build.

Multi-stage builds

Multi-stage builds let you compile in one stage and copy only the artifact to a minimal final image. Build tools, dev dependencies, and source code never ship to production.

# Stage 1: build
FROM golang:1.22 AS builder
WORKDIR /src
COPY . .
RUN CGO_ENABLED=0 go build -o /app

# Stage 2: runtime
FROM gcr.io/distroless/static
COPY --from=builder /app /app
ENTRYPOINT ["/app"]

The final image contains only the static binary -- no Go toolchain, no source. Image size drops from ~800 MB to ~5 MB.

04 / Networking

Container Networking

Docker provides pluggable network drivers. The driver you choose determines how containers communicate with each other and the outside world.

Driver	Scope	Use Case
`bridge`	Single host	Default. Containers on same bridge can talk via container name (with user-defined bridge).
`host`	Single host	Container shares the host's network stack. No isolation, but no NAT overhead.
`none`	Single host	No networking at all. For batch jobs or security-sensitive workloads.
`overlay`	Multi-host	Spans multiple Docker daemons (Swarm). Uses VXLAN tunneling.

Port mapping

Containers in a bridge network are isolated from the host by default. To expose a service, map a host port to a container port:

# -p hostPort:containerPort
docker run -p 8080:3000 myapp

# Bind to specific interface
docker run -p 127.0.0.1:8080:3000 myapp

# Random host port
docker run -p 3000 myapp     # check with docker port

DNS resolution

The default bridge network does not provide DNS resolution between containers (only --link, which is deprecated). Always create a user-defined bridge network -- containers on it can resolve each other by name automatically.

05 / Storage & Registries

Volumes, Mounts & Image Distribution

Storage options

Container filesystems are ephemeral -- data is lost when the container is removed. Docker provides three ways to persist data:

Type	Managed by Docker?	Best For
Named volume	Yes (`/var/lib/docker/volumes/`)	Databases, persistent app data. Portable across hosts.
Bind mount	No (any host path)	Development (live code reload). Host path must exist.
tmpfs	No (RAM only)	Secrets, scratch data. Never written to disk. Linux only.

# Named volume
docker volume create pgdata
docker run -v pgdata:/var/lib/postgresql/data postgres:16

# Bind mount (development)
docker run -v $(pwd)/src:/app/src myapp

# tmpfs (secrets)
docker run --tmpfs /run/secrets:rw,noexec,size=64m myapp

Container Registries

Registries store and distribute container images. The main options are Docker Hub (public default), Amazon ECR, Google GCR / Artifact Registry, and GitHub Container Registry (ghcr.io).

Tagging strategy

Avoid :latest

The :latest tag is mutable -- it silently changes when a new image is pushed. This makes builds non-reproducible and rollbacks impossible. Use immutable tags instead.

Semver

myapp:1.4.2 -- clear version, easy to reason about, standard for releases.

Git SHA

myapp:a1b2c3d -- ties image to exact commit. Great for CI/CD traceability.

Combined

myapp:1.4.2-a1b2c3d -- human-readable version plus exact commit for debugging.

06 / Compose & Security

Multi-Container Apps & Hardening

Docker Compose

Compose defines multi-container applications in a single YAML file. Each service gets its own container, and Compose handles networking, volumes, and startup order.

# docker-compose.yml
services:
  web:
    build: .
    ports:
      - "8080:3000"
    depends_on:
      db:
        condition: service_healthy
    environment:
      DATABASE_URL: postgres://user:pass@db:5432/mydb

  db:
    image: postgres:16
    volumes:
      - pgdata:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U user"]
      interval: 5s
      retries: 5

volumes:
  pgdata:

depends_on is not enough

depends_on only controls startup order, not readiness. Use condition: service_healthy with a healthcheck to wait for the dependency to actually be ready.

Container Security

Containers share the host kernel, so the security boundary is thinner than a VM. Every layer of defense matters.

Non-root user

Add USER nonroot in your Dockerfile. Running as root inside a container means root on the host if the container escapes.

Read-only FS

Run with --read-only and use tmpfs for directories that need writes. Prevents attackers from modifying binaries.

Minimal base images

Use alpine (~5 MB) or distroless (no shell at all). Fewer packages means fewer CVEs.

Image scanning

Scan images with docker scout, Trivy, or Snyk in CI. Block deployment if critical CVEs are found.

# Hardened Dockerfile example
FROM node:20-slim AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --production

FROM gcr.io/distroless/nodejs20
COPY --from=builder /app /app
WORKDIR /app
COPY . .
USER nonroot
CMD ["server.js"]

Defense in depth

Combine multiple controls: non-root user + read-only filesystem + minimal base + no capabilities (--cap-drop ALL) + image scanning in CI. No single measure is sufficient alone.

Test Yourself

Score: 0 / 10

Question 01

Which three Linux kernel features combine to create a container?

Namespaces isolate what a process can see, cgroups limit resource usage, and a layered filesystem (e.g., OverlayFS) provides the root filesystem. Together these create the container abstraction.

Question 02

What is the key difference between CMD and ENTRYPOINT in a Dockerfile?

ENTRYPOINT defines the executable that always runs. CMD provides default arguments to it, which can be overridden by passing arguments to docker run.

Question 03

Why should you COPY package.json before COPY . . in a Node.js Dockerfile?

Docker layer caching rebuilds from the first changed instruction onward. By copying package.json separately, the expensive npm install layer stays cached as long as dependencies don't change -- even when source files do.

Question 04

What is the main benefit of a multi-stage Docker build?

Multi-stage builds let you use a full build environment in one stage and copy only the compiled artifact to a minimal final image. Build tools, source code, and dev dependencies are discarded.

Question 05

Which Docker network driver allows containers on different hosts to communicate?

The overlay driver creates a distributed network across multiple Docker daemons using VXLAN tunneling. It is used in Docker Swarm and can be integrated with Kubernetes networking.

Question 06

What is the problem with using the :latest tag in production?

The :latest tag is mutable. When someone pushes a new image, :latest points to the new version. This means the same tag can resolve to different images over time, making builds non-reproducible and rollbacks impossible.

Question 07

Which storage option keeps data only in RAM and never writes to disk?

A tmpfs mount stores data in the host's RAM only. It is useful for sensitive data like secrets because nothing is ever written to the host filesystem.

Question 08

Why should you run containers as a non-root user?

Containers share the host kernel. If a container running as root exploits a kernel vulnerability to escape, the attacker gains root access on the host. Running as non-root limits the blast radius of a container escape.

Question 09

In Docker Compose, what does depends_on (without a health condition) actually guarantee?

depends_on without a condition only controls startup order -- it starts the dependency first but does not wait for it to be ready. Use condition: service_healthy with a healthcheck to wait for actual readiness.

Question 10

What role does runc play in the container runtime stack?

runc is the reference implementation of the OCI runtime spec. It is the lowest layer in the stack that actually calls Linux kernel APIs (namespaces, cgroups) to create and start container processes. Higher-level runtimes like containerd manage images and lifecycle on top of runc.