What Is a Container?
A container is not a VM. It is a regular Linux process whose view of the system has been restricted by three kernel features working together: namespaces isolate what the process can see, cgroups limit what it can use, and a layered filesystem (like OverlayFS) gives it its own root directory without duplicating the entire OS.
Namespaces
Each namespace type isolates a different resource: pid (process tree), net (network stack), mnt (filesystem mounts), uts (hostname), ipc (inter-process communication), user (UID/GID mappings), and cgroup (cgroup root view). A container typically gets its own instance of all seven.
Cgroups (Control Groups)
Cgroups cap and account for CPU, memory, I/O, and network bandwidth. Without them a container could starve the host. cgroups v2 (unified hierarchy) is the modern default and what Docker uses on recent kernels.
Containers vs VMs
| Property | Container | Virtual Machine |
|---|---|---|
| Isolation | Process-level (shared kernel) | Hardware-level (separate kernel) |
| Startup time | Milliseconds | Seconds to minutes |
| Image size | MBs (often < 100 MB) | GBs |
| Resource overhead | Near zero | Hypervisor + guest OS |
| Security boundary | Weaker (kernel shared) | Stronger (separate kernel) |
OCI and Runtimes
The Open Container Initiative (OCI) defines two specs: the image spec (how layers and metadata are packaged) and the runtime spec (how a container is created from a root filesystem and config). runc is the reference low-level runtime. Higher-level runtimes like containerd and CRI-O manage image pulls, storage, and lifecycle on top of runc.
Dockerfile Instructions
A Dockerfile is a declarative recipe that turns a base image into your application image. Each instruction creates a new layer (or metadata entry) in the final image.
Sets the base image. Every Dockerfile starts here. Use specific tags: FROM node:20-slim.
Executes a command during build. Chain commands with && to reduce layers.
COPY copies files from build context. ADD also handles URLs and tar extraction -- prefer COPY for clarity.
Sets the working directory for subsequent instructions. Creates the directory if it does not exist.
ENV sets runtime environment variables (persists in image). ARG is build-time only and disappears after build.
Documents which ports the container listens on. Does not actually publish the port -- that requires -p at runtime.
CMD vs ENTRYPOINT
CMD provides default arguments that can be overridden at docker run. ENTRYPOINT sets the executable that always runs. When both are present, CMD supplies default arguments to ENTRYPOINT.
# ENTRYPOINT + CMD pattern
ENTRYPOINT ["python", "app.py"]
CMD ["--port", "8000"]
# docker run myimage => python app.py --port 8000
# docker run myimage --port 9000 => python app.py --port 9000
["executable", "arg"]) over shell form (executable arg). Shell form wraps your command in /bin/sh -c, which means your process runs as PID 1's child and won't receive SIGTERM properly.
Build Smart, Build Fast
Every RUN, COPY, and ADD instruction creates a new layer. Docker caches each layer and only rebuilds from the first instruction whose input changed. This means instruction order matters enormously.
Optimal instruction order
Put things that change rarely at the top (OS packages, language runtime) and things that change often at the bottom (your application code).
# Good: dependencies cached separately from code
FROM node:20-slim
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci --production
COPY . .
CMD ["node", "server.js"]
COPY . . layer. The npm ci layer stays cached because package.json did not change. This saves minutes on every build.
Multi-stage builds
Multi-stage builds let you compile in one stage and copy only the artifact to a minimal final image. Build tools, dev dependencies, and source code never ship to production.
# Stage 1: build
FROM golang:1.22 AS builder
WORKDIR /src
COPY . .
RUN CGO_ENABLED=0 go build -o /app
# Stage 2: runtime
FROM gcr.io/distroless/static
COPY --from=builder /app /app
ENTRYPOINT ["/app"]
The final image contains only the static binary -- no Go toolchain, no source. Image size drops from ~800 MB to ~5 MB.
Container Networking
Docker provides pluggable network drivers. The driver you choose determines how containers communicate with each other and the outside world.
| Driver | Scope | Use Case |
|---|---|---|
bridge | Single host | Default. Containers on same bridge can talk via container name (with user-defined bridge). |
host | Single host | Container shares the host's network stack. No isolation, but no NAT overhead. |
none | Single host | No networking at all. For batch jobs or security-sensitive workloads. |
overlay | Multi-host | Spans multiple Docker daemons (Swarm). Uses VXLAN tunneling. |
Port mapping
Containers in a bridge network are isolated from the host by default. To expose a service, map a host port to a container port:
# -p hostPort:containerPort
docker run -p 8080:3000 myapp
# Bind to specific interface
docker run -p 127.0.0.1:8080:3000 myapp
# Random host port
docker run -p 3000 myapp # check with docker port
--link, which is deprecated). Always create a user-defined bridge network -- containers on it can resolve each other by name automatically.
Volumes, Mounts & Image Distribution
Storage options
Container filesystems are ephemeral -- data is lost when the container is removed. Docker provides three ways to persist data:
| Type | Managed by Docker? | Best For |
|---|---|---|
| Named volume | Yes (/var/lib/docker/volumes/) | Databases, persistent app data. Portable across hosts. |
| Bind mount | No (any host path) | Development (live code reload). Host path must exist. |
| tmpfs | No (RAM only) | Secrets, scratch data. Never written to disk. Linux only. |
# Named volume
docker volume create pgdata
docker run -v pgdata:/var/lib/postgresql/data postgres:16
# Bind mount (development)
docker run -v $(pwd)/src:/app/src myapp
# tmpfs (secrets)
docker run --tmpfs /run/secrets:rw,noexec,size=64m myapp
Container Registries
Registries store and distribute container images. The main options are Docker Hub (public default), Amazon ECR, Google GCR / Artifact Registry, and GitHub Container Registry (ghcr.io).
Tagging strategy
:latest tag is mutable -- it silently changes when a new image is pushed. This makes builds non-reproducible and rollbacks impossible. Use immutable tags instead.
myapp:1.4.2 -- clear version, easy to reason about, standard for releases.
myapp:a1b2c3d -- ties image to exact commit. Great for CI/CD traceability.
myapp:1.4.2-a1b2c3d -- human-readable version plus exact commit for debugging.
Multi-Container Apps & Hardening
Docker Compose
Compose defines multi-container applications in a single YAML file. Each service gets its own container, and Compose handles networking, volumes, and startup order.
# docker-compose.yml
services:
web:
build: .
ports:
- "8080:3000"
depends_on:
db:
condition: service_healthy
environment:
DATABASE_URL: postgres://user:pass@db:5432/mydb
db:
image: postgres:16
volumes:
- pgdata:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U user"]
interval: 5s
retries: 5
volumes:
pgdata:
depends_on only controls startup order, not readiness. Use condition: service_healthy with a healthcheck to wait for the dependency to actually be ready.
Container Security
Containers share the host kernel, so the security boundary is thinner than a VM. Every layer of defense matters.
Add USER nonroot in your Dockerfile. Running as root inside a container means root on the host if the container escapes.
Run with --read-only and use tmpfs for directories that need writes. Prevents attackers from modifying binaries.
Use alpine (~5 MB) or distroless (no shell at all). Fewer packages means fewer CVEs.
Scan images with docker scout, Trivy, or Snyk in CI. Block deployment if critical CVEs are found.
# Hardened Dockerfile example
FROM node:20-slim AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --production
FROM gcr.io/distroless/nodejs20
COPY --from=builder /app /app
WORKDIR /app
COPY . .
USER nonroot
CMD ["server.js"]
--cap-drop ALL) + image scanning in CI. No single measure is sufficient alone.
Test Yourself
docker run.npm install layer stays cached as long as dependencies don't change -- even when source files do.:latest tag is mutable. When someone pushes a new image, :latest points to the new version. This means the same tag can resolve to different images over time, making builds non-reproducible and rollbacks impossible.depends_on without a condition only controls startup order -- it starts the dependency first but does not wait for it to be ready. Use condition: service_healthy with a healthcheck to wait for actual readiness.runc is the reference implementation of the OCI runtime spec. It is the lowest layer in the stack that actually calls Linux kernel APIs (namespaces, cgroups) to create and start container processes. Higher-level runtimes like containerd manage images and lifecycle on top of runc.