gzip is a lossless compression format created in 1992 by Jean-loup Gailly and Mark Adler. Under the hood it uses the DEFLATE algorithm, which chains two complementary techniques: LZ77 eliminates repeated substrings by replacing them with (distance, length) back-references, then Huffman coding assigns shorter bit patterns to more-frequent symbols in the resulting stream.
These two stages attack different kinds of redundancy. LZ77 exploits repetition across the input — a word appearing 10× becomes 1 literal + 9 back-references. Huffman exploits frequency imbalance — if 'e' appears far more than 'q', assign 'e' only 2 bits and 'q' 10 bits. Together they typically compress English text to 30–70% of its original size.
The gzip file wrapper adds a 10-byte header (magic bytes 1f 8b, compression method, flags, mtime) and appends a 4-byte CRC32 checksum plus ISIZE (original size mod 232), enabling integrity verification and streaming decompression without seeking.
| Format | Algorithm | Ratio | Speed | HTTP |
|---|---|---|---|---|
| gzip | DEFLATE | Good | Fast | Universal |
| brotli | LZ77+Huffman+ctx | Better | Slower | Modern |
| zstd | ANS+LZ4 | Best | Fastest | Limited |
| lz4 | LZ4 | OK | Fastest | – |
| bzip2 | BWT+Huffman | Better | Slow | – |
| xz/lzma | LZMA2 | Best | Slowest | – |
gzip is the most ubiquitous compression format in computing. Its balance of speed, ratio, and universal runtime support makes it the default for HTTP transport, log management, data archival, and container distribution.
Accept-Encoding: gzip, br. Server compresses the response body and replies with Content-Encoding: gzip. Browser decompresses transparently — JS/DOM never sees the compressed bytes..gz versions of every static asset. nginx gzip_static on serves bundle.js.gz instead of bundle.js when the browser supports it — zero CPU per request.Accept-Encoding before compressing.Do not compress: already-compressed formats (JPEG, PNG, MP4, ZIP, WebP) — they will not shrink and CPU is wasted. Set a minimum size threshold (~1 KB) to avoid compressing tiny responses where the 18-byte gzip wrapper is disproportionately large. Always set Vary: Accept-Encoding so CDNs cache separate gzip and plain copies.
All major runtimes ship gzip in the standard library. The golden rule: always stream — pipe data through a gzip transform rather than buffering the whole payload. For HTTP middleware, a single line enables automatic compression. For upload/ingest APIs, check Content-Encoding: gzip and decode before processing.
// ── Compress / decompress a buffer ────────────────────────────
const zlib = require('zlib'), { promisify } = require('util');
const gzip = promisify(zlib.gzip);
const gunzip = promisify(zlib.gunzip);
const original = Buffer.from('Hello Hello World Hello');
const compressed = await gzip(original, { level: 6 }); // level 1-9
const restored = await gunzip(compressed);
console.log(compressed.length, 'bytes compressed');
// ── Stream a file (never loads fully into memory) ─────────────
const fs = require('fs');
fs.createReadStream('access.log')
.pipe(zlib.createGzip({ level: 6 }))
.pipe(fs.createWriteStream('access.log.gz'));
// ── Express: auto-compress all responses above 1 KB ───────────
const compression = require('compression');
app.use(compression({ threshold: 1024, level: 6 }));
// ── Hono / Fastify / Next.js ───────────────────────────────────
// Fastify: fastify.register(require('@fastify/compress'), { global: true })
// Next.js: compress: true in next.config.js (default enabled)
// ── Accept a gzip-encoded POST body ───────────────────────────
app.post('/ingest', (req, res) => {
if (req.headers['content-encoding'] === 'gzip') {
const chunks = [];
req.pipe(zlib.createGunzip())
.on('data', d => chunks.push(d))
.on('end', () => process(Buffer.concat(chunks)));
}
});
// ── Decompress with size limit (gzip bomb protection) ─────────
const safe = await gunzip(untrustedBuf, { maxOutputLength: 10 * 1024 * 1024 });
# ── Compress / decompress in memory ────────────────────────────
import gzip
original = b'Hello Hello World Hello'
compressed = gzip.compress(original, compresslevel=6) # 1-9
restored = gzip.decompress(compressed)
print(len(compressed), 'bytes')
# ── Read / write .gz files ──────────────────────────────────────
with gzip.open('access.log.gz', 'wt', compresslevel=6) as f:
f.write('log line\n' * 100_000)
with gzip.open('access.log.gz', 'rt') as f: # 'rt' = text, auto-decode
for line in f:
process(line)
# ── Stream compress (no full buffer in memory) ──────────────────
import shutil
with open('dump.sql', 'rb') as src, gzip.open('dump.sql.gz', 'wb') as dst:
shutil.copyfileobj(src, dst, length=65536) # 64 KB chunks
# ── FastAPI middleware: auto-compress responses >= 1 KB ─────────
from fastapi import FastAPI
from starlette.middleware.gzip import GZipMiddleware
app = FastAPI()
app.add_middleware(GZipMiddleware, minimum_size=1000)
# ── requests: transparently decodes gzip responses ─────────────
import requests
r = requests.get('https://api.example.com/data')
print(r.text) # already decompressed; r.headers['Content-Encoding'] was 'gzip'
# ── Decode a gzip payload (e.g. from S3 / SQS event) ───────────
import base64
raw = base64.b64decode(event['body'])
data = gzip.decompress(raw).decode('utf-8')
# ── Protect against gzip bombs ──────────────────────────────────
import io
MAX = 10 * 1024 * 1024 # 10 MB
with gzip.open(io.BytesIO(untrusted_bytes)) as f:
data = f.read(MAX + 1)
if len(data) > MAX: raise ValueError('gzip bomb')
// ── Compress / decompress in memory ────────────────────────────
import ("bytes"; "compress/gzip"; "io")
func Compress(data []byte, level int) ([]byte, error) {
var buf bytes.Buffer
w, _ := gzip.NewWriterLevel(&buf, level) // gzip.BestSpeed=1 .. gzip.BestCompression=9
defer w.Close()
w.Write(data)
w.Close()
return buf.Bytes(), nil
}
func Decompress(data []byte) ([]byte, error) {
r, err := gzip.NewReader(bytes.NewReader(data))
if err != nil { return nil, err }
defer r.Close()
return io.ReadAll(io.LimitReader(r, 10<<20)) // 10 MB limit
}
// ── Stream a file ──────────────────────────────────────────────
src, _ := os.Open("access.log")
dst, _ := os.Create("access.log.gz")
w, _ := gzip.NewWriterLevel(dst, gzip.DefaultCompression)
io.Copy(w, src); w.Close(); src.Close(); dst.Close()
// ── HTTP handler: decode incoming gzip body ────────────────────
func handler(w http.ResponseWriter, r *http.Request) {
body := r.Body
if r.Header.Get("Content-Encoding") == "gzip" {
gr, err := gzip.NewReader(r.Body)
if err != nil { http.Error(w, "bad gzip", 400); return }
defer gr.Close(); body = gr
}
data, _ := io.ReadAll(io.LimitReader(body, 10<<20))
// process data...
}
// ── HTTP handler: send gzip response ──────────────────────────
if strings.Contains(r.Header.Get("Accept-Encoding"), "gzip") {
w.Header().Set("Content-Encoding", "gzip")
w.Header().Set("Vary", "Accept-Encoding")
gz, _ := gzip.NewWriterLevel(w, gzip.DefaultCompression)
defer gz.Close()
json.NewEncoder(gz).Encode(payload)
} else {
json.NewEncoder(w).Encode(payload)
}
// Or use nytimes/gziphandler / klauspost/compress for drop-in middleware
Never compress HTTPS responses that mix secret tokens (CSRF, session IDs) with attacker-controlled input (query params, POST bodies). Compression leaks secret length via ciphertext size changes. CRIME exploits TLS-layer compression; BREACH exploits HTTP-layer. Fix: disable gzip on pages with secrets, or use per-request CSRF token masking (randomise the secret before compressing).
A crafted 1 KB gzip file can expand to 1 GB+. Always set a maximum decompression limit before writing to memory. Node.js: gunzip(buf, {maxOutputLength: 10*1024*1024}). Go: io.LimitReader(r, maxBytes). Python: read incrementally and count bytes. Never decompress untrusted data without a size cap — classic DoS vector in file upload and webhook ingest endpoints.
Three distinct wrappers around the same DEFLATE core. gzip = 10-byte header + DEFLATE + CRC32 + ISIZE. zlib = 2-byte header + DEFLATE + Adler-32. raw deflate = no wrapper. HTTP’s Content-Encoding: deflate historically sends zlib format, not raw DEFLATE — a known ambiguity. Always use Content-Encoding: gzip for HTTP; every browser and CDN handles it correctly.
Compressing already-compressed data (JPEG, PNG, MP4, ZIP, .wasm) adds the 18-byte gzip header with zero benefit and often makes the output slightly larger. Check Content-Type before enabling middleware compression. Also: set a minimum_size threshold (~1 KB) everywhere — compressing a 200-byte JSON response wastes more CPU than it saves in transfer time on modern networks.
gzip exposes a compression level from 1 (fastest, worst ratio) to 9 (slowest, best ratio). Level 6 is the default — a sweet spot that gets 90% of the compression benefit at ~30% of level-9’s CPU cost. For HTTP responses generated on demand, level 5 or 6 is almost always the right choice. For offline archival, level 9 makes sense.
Ratios shown are approximate for a typical 100 KB JSON payload. Notice that going from level 6 to level 9 gains only ~4% extra compression but can cost 3–4× more CPU. The diminishing returns are steep above level 6.
Never buffer a full payload to compress it if it can be streamed. Piping a 1 GB log file through createGzip() uses ~256 KB of working memory. Buffering it first requires 1 GB of RAM. gzip is a stream cipher — it processes data in sliding windows. Always prefer pipe() / io.Copy() / shutil.copyfileobj() over compress(wholeBuffer) for anything larger than a few MB.
For HTTP responses in Node.js, the compression() middleware automatically switches to streaming: it wraps res.write() and res.end() so chunks are compressed as they are written, with no full-body buffer required.
| Format / Level | Output size | Compress time | Decompress |
|---|---|---|---|
| Uncompressed | 100 KB | — | — |
| gzip -1 | ~22 KB | 0.8 ms | 0.3 ms |
| gzip -6 (default) | ~18 KB | 2.4 ms | 0.4 ms |
| gzip -9 | ~17 KB | 7.1 ms | 0.4 ms |
| brotli -4 | ~16 KB | 3.2 ms | 0.8 ms |
| brotli -11 | ~14 KB | 850 ms | 0.9 ms |
| zstd -3 | ~17 KB | 0.6 ms | 0.2 ms |
Benchmarks are indicative (single-core, M2 Pro). Key insight: decompression is always fast regardless of compression level — the browser/client pays no meaningful cost. All CPU cost is on the server/sender side at compression time.
| Content type | Why skip gzip | Action |
|---|---|---|
| JPEG / PNG / WebP | Already compressed | Skip |
| MP4 / WebM / MP3 | Already compressed | Skip |
| .zip / .gz / .br | Already compressed | Skip |
| < 1 KB response | Header overhead dominates | Skip / threshold |
| Server-Sent Events | Streaming: no Content-Length | Use carefully |
| WebSockets | Use permessage-deflate extension | Per-message |
| TLS + user secrets | BREACH attack surface | Disable / mask |
Vary: Accept-Encoding so the CDN stores separate cached copies for gzip and non-gzip clients. Without Vary, a non-gzip client might receive a gzip-encoded response they can’t decode — a common misconfiguration.
Accept-Encoding and pick br over gzip when available. Brotli yields ~15–20% better compression than gzip on typical web assets.