From Power-On to PID 1
When a Linux machine powers on, control passes through a precise chain of handoffs. Each stage validates the next before transferring execution, building up from raw firmware to a fully running userspace.
BIOS / UEFI
BIOS (legacy) reads the MBR from the first 512 bytes of the boot disk. UEFI reads an EFI System Partition (FAT32) and loads an EFI application directly, supporting Secure Boot signature verification.
GRUB2
The bootloader presents a menu, loads the kernel image (vmlinuz) and the initial ramdisk (initramfs) into memory, then passes kernel command-line parameters (e.g., root=, quiet, rd.break).
# View current kernel command line
cat /proc/cmdline
# Edit GRUB defaults
vi /etc/default/grub
grub2-mkconfig -o /boot/grub2/grub.cfg
Kernel & initramfs
The kernel decompresses itself, initializes hardware, and mounts initramfs as a temporary root filesystem. initramfs contains just enough drivers and scripts to find and mount the real root partition. Once the real root is mounted, the kernel executes /sbin/init (symlinked to systemd on modern distros).
lsinitrd (RHEL) or lsinitramfs (Debian).
systemd as PID 1
systemd becomes the first userspace process. It reads its default target (typically multi-user.target or graphical.target), builds a dependency graph, and starts services in parallel. Every other process on the system is a descendant of PID 1.
Units, Targets & Service Management
systemd replaces SysVinit scripts with declarative unit files. It manages services, sockets, timers, mounts, and more through a unified dependency system.
Unit Types
| Unit Type | Purpose | Example |
|---|---|---|
.service | Daemons and long-running processes | nginx.service |
.socket | IPC/network socket activation | sshd.socket |
.timer | Scheduled tasks (replaces cron) | logrotate.timer |
.mount | Filesystem mount points | home.mount |
.target | Grouping / synchronization points | multi-user.target |
.slice | cgroup resource boundaries | user.slice |
Socket Activation
systemd can listen on a socket and only start the actual service when a connection arrives. This speeds up boot (services start on demand) and provides automatic restart on crash.
# sshd.socket starts sshd.service only on incoming connection
systemctl enable sshd.socket
systemctl start sshd.socket
systemctl disable sshd.service # no need to auto-start the service
Dependencies & Ordering
Unit files declare dependencies with Wants=, Requires=, After=, and Before=. Requires is hard (failure propagates), Wants is soft (best-effort). Ordering is separate from dependency: After=/Before= control startup sequence.
journalctl
# Follow logs for a specific unit
journalctl -u nginx.service -f
# Logs since last boot
journalctl -b
# Logs from a time range
journalctl --since "2024-01-01 00:00" --until "2024-01-02 00:00"
# Kernel messages only
journalctl -k
# Show disk usage of journal
journalctl --disk-usage
systemctl list-dependencies multi-user.target to visualize the full dependency tree. Add --reverse to see what depends on a specific unit.
Observing & Debugging Processes
Linux exposes rich process information through /proc, /sys, and a suite of command-line tools for real-time observation, tracing, and profiling.
Snapshot and real-time views of processes. ps auxf for tree view, htop for interactive filtering and tree mode.
Traces system calls made by a process. Invaluable for debugging "why is this program hanging?" scenarios. strace -p PID -e trace=open,read
Traces library calls (libc, etc.). Shows calls like malloc(), printf(), fopen() at the shared library boundary.
Hardware-level profiling using CPU performance counters. perf top for live profiling, perf record + perf report for offline analysis.
Lists open files, sockets, and pipes for a process. lsof -i :80 to find what is listening on port 80.
Kernel ring buffer messages. Shows hardware detection, driver errors, OOM kills. Use dmesg -T for human-readable timestamps.
The /proc Filesystem
/proc is a virtual filesystem that exposes kernel and per-process data. Every running process has a directory at /proc/[pid]/.
# Key /proc entries
/proc/[pid]/status # Process state, memory, UIDs
/proc/[pid]/fd/ # Open file descriptors (symlinks)
/proc/[pid]/maps # Memory mappings
/proc/[pid]/cmdline # Full command line
/proc/cpuinfo # CPU details
/proc/meminfo # Memory statistics
/proc/loadavg # 1/5/15 min load averages
The /sys Filesystem
/sys (sysfs) exposes the kernel's device model. Unlike /proc, it is structured hierarchically by bus, device, and driver. It is the primary interface for device configuration and hardware info.
/proc/sys/ or /sys/ files takes effect immediately with no confirmation. A typo in /proc/sys/vm/overcommit_memory can cause OOM kills system-wide. Always validate values before writing.
Network Configuration & Debugging
Modern Linux networking uses the ip command (replacing ifconfig), ss (replacing netstat), and nftables (replacing iptables).
| Tool | Purpose | Common Usage |
|---|---|---|
ip | Interface, address, route management | ip addr show, ip route |
ss | Socket statistics | ss -tulnp (listening TCP/UDP with PIDs) |
iptables | Legacy packet filter (netfilter) | iptables -L -n -v |
nftables | Modern packet filter (replaces iptables) | nft list ruleset |
tc | Traffic control / QoS | tc qdisc show |
tcpdump | Packet capture | tcpdump -i eth0 port 443 -w capture.pcap |
dig | DNS lookup | dig +short example.com A |
iptables vs nftables
iptables uses separate binaries for IPv4, IPv6, ARP, and bridging (iptables, ip6tables, arptables, ebtables). nftables unifies all of these behind a single nft command with a cleaner rule syntax and better performance through set-based matching.
# iptables: block incoming traffic on port 8080
iptables -A INPUT -p tcp --dport 8080 -j DROP
# nftables equivalent
nft add rule inet filter input tcp dport 8080 drop
# Traffic shaping: add 100ms latency (testing)
tc qdisc add dev eth0 root netem delay 100ms
# DNS trace
dig +trace example.com
ss -tulnp is the single most useful networking command. It shows all listening sockets with the PID and program name. Learn it by heart: tcp, udp, listening, numeric, process.
File Permissions, ACLs & Security Frameworks
Traditional Unix Permissions
Every file has three permission sets: user (owner), group, and other. Each set can have read (4), write (2), and execute (1). The octal notation 755 means rwxr-xr-x.
Special Bits
| Bit | Octal | Effect on Files | Effect on Directories |
|---|---|---|---|
setuid | 4000 | Execute as file owner (e.g., passwd) | No effect |
setgid | 2000 | Execute as file group | New files inherit directory group |
sticky | 1000 | No effect | Only owner can delete files (e.g., /tmp) |
POSIX ACLs
ACLs extend the basic user/group/other model with per-user and per-group entries. The filesystem must be mounted with ACL support.
# Grant user "deploy" read+write on a file
setfacl -m u:deploy:rw /var/www/config.yml
# View ACLs
getfacl /var/www/config.yml
# Set default ACL on directory (inherited by new files)
setfacl -d -m g:devops:rwx /var/www/
SELinux & AppArmor
SELinux (RHEL/Fedora) uses mandatory access control with type enforcement. Every process and file has a security context label. Policies define which types can interact. AppArmor (Ubuntu/SUSE) uses path-based profiles that are simpler to write but less granular.
# SELinux: check current mode
getenforce
# SELinux: view file contexts
ls -Z /var/www/html/
# SELinux: restore default context
restorecon -Rv /var/www/html/
# AppArmor: check profile status
aa-status
seccomp & Capabilities
seccomp restricts which system calls a process can make. Docker and systemd both use seccomp profiles to sandbox processes. Capabilities split the monolithic root privilege into fine-grained tokens (e.g., CAP_NET_BIND_SERVICE lets a process bind to ports below 1024 without full root).
# View capabilities of a binary
getcap /usr/bin/ping
# /usr/bin/ping cap_net_raw=ep
# Grant a capability
setcap cap_net_bind_service=+ep /usr/local/bin/myserver
setuid binaries are a classic attack surface. Any vulnerability in a setuid-root program grants the attacker root. Prefer capabilities over setuid whenever possible. Audit setuid binaries with find / -perm -4000 -type f.
Resource Control & Kernel Tuning
Linux provides multiple layers of knobs for controlling process priority, CPU affinity, memory limits, and kernel-level behavior.
Set process scheduling priority (-20 to 19). Lower value = higher priority. nice -n 10 make -j8 runs a build at reduced priority.
Pin a process to specific CPU cores (CPU affinity). taskset -c 0,1 ./app restricts to cores 0 and 1. Prevents cache-thrashing from migration.
Control NUMA memory allocation policy. numactl --cpunodebind=0 --membind=0 ./app keeps processes on a single NUMA node for faster memory access.
Per-process resource limits: open files, stack size, core dumps. ulimit -n 65536 raises the file descriptor limit. Persistent via /etc/security/limits.conf.
sysctl Tuning
sysctl reads and writes kernel parameters at runtime via /proc/sys/. Persistent changes go in /etc/sysctl.d/*.conf.
# Common performance-related sysctls
sysctl -w vm.swappiness=10 # Reduce swap aggressiveness
sysctl -w net.core.somaxconn=65535 # Max listen backlog
sysctl -w net.ipv4.tcp_tw_reuse=1 # Reuse TIME_WAIT sockets
sysctl -w fs.file-max=2097152 # System-wide file descriptor limit
sysctl -w kernel.pid_max=4194304 # Max PID value
# Apply from config file
sysctl --system
somaxconn, file-max, and per-process ulimit -n. Set vm.swappiness=10 (or 0 for databases). Enable tcp_tw_reuse for services making many outbound connections. Always benchmark before and after changes.
Test Yourself
Wants= and Requires= in a systemd unit file?ss -tulnp show?nice -n 19 ./script.sh do?sysctl -w vm.swappiness=10 do?