Operating Systems

Linux Administration & Internals

From power-on to production: the boot chain, systemd orchestration, process introspection, networking, permissions, security frameworks, and performance tuning.

01 / Boot Process

From Power-On to PID 1

When a Linux machine powers on, control passes through a precise chain of handoffs. Each stage validates the next before transferring execution, building up from raw firmware to a fully running userspace.

Linux Boot Chain
BIOS / UEFI
GRUB2
Kernel
initramfs
systemd (PID 1)

BIOS / UEFI

BIOS (legacy) reads the MBR from the first 512 bytes of the boot disk. UEFI reads an EFI System Partition (FAT32) and loads an EFI application directly, supporting Secure Boot signature verification.

GRUB2

The bootloader presents a menu, loads the kernel image (vmlinuz) and the initial ramdisk (initramfs) into memory, then passes kernel command-line parameters (e.g., root=, quiet, rd.break).

# View current kernel command line
cat /proc/cmdline

# Edit GRUB defaults
vi /etc/default/grub
grub2-mkconfig -o /boot/grub2/grub.cfg

Kernel & initramfs

The kernel decompresses itself, initializes hardware, and mounts initramfs as a temporary root filesystem. initramfs contains just enough drivers and scripts to find and mount the real root partition. Once the real root is mounted, the kernel executes /sbin/init (symlinked to systemd on modern distros).

Key Insight
initramfs exists because the kernel cannot include every possible storage driver. It is a CPIO archive (not a filesystem) that the kernel unpacks into a tmpfs. You can inspect it with lsinitrd (RHEL) or lsinitramfs (Debian).

systemd as PID 1

systemd becomes the first userspace process. It reads its default target (typically multi-user.target or graphical.target), builds a dependency graph, and starts services in parallel. Every other process on the system is a descendant of PID 1.

02 / systemd

Units, Targets & Service Management

systemd replaces SysVinit scripts with declarative unit files. It manages services, sockets, timers, mounts, and more through a unified dependency system.

Unit Types

Unit TypePurposeExample
.serviceDaemons and long-running processesnginx.service
.socketIPC/network socket activationsshd.socket
.timerScheduled tasks (replaces cron)logrotate.timer
.mountFilesystem mount pointshome.mount
.targetGrouping / synchronization pointsmulti-user.target
.slicecgroup resource boundariesuser.slice

Socket Activation

systemd can listen on a socket and only start the actual service when a connection arrives. This speeds up boot (services start on demand) and provides automatic restart on crash.

# sshd.socket starts sshd.service only on incoming connection
systemctl enable sshd.socket
systemctl start sshd.socket
systemctl disable sshd.service   # no need to auto-start the service

Dependencies & Ordering

Unit files declare dependencies with Wants=, Requires=, After=, and Before=. Requires is hard (failure propagates), Wants is soft (best-effort). Ordering is separate from dependency: After=/Before= control startup sequence.

journalctl

# Follow logs for a specific unit
journalctl -u nginx.service -f

# Logs since last boot
journalctl -b

# Logs from a time range
journalctl --since "2024-01-01 00:00" --until "2024-01-02 00:00"

# Kernel messages only
journalctl -k

# Show disk usage of journal
journalctl --disk-usage
Practical Tip
Use systemctl list-dependencies multi-user.target to visualize the full dependency tree. Add --reverse to see what depends on a specific unit.
03 / Process Introspection

Observing & Debugging Processes

Linux exposes rich process information through /proc, /sys, and a suite of command-line tools for real-time observation, tracing, and profiling.

ps / top / htop

Snapshot and real-time views of processes. ps auxf for tree view, htop for interactive filtering and tree mode.

strace

Traces system calls made by a process. Invaluable for debugging "why is this program hanging?" scenarios. strace -p PID -e trace=open,read

ltrace

Traces library calls (libc, etc.). Shows calls like malloc(), printf(), fopen() at the shared library boundary.

perf

Hardware-level profiling using CPU performance counters. perf top for live profiling, perf record + perf report for offline analysis.

lsof

Lists open files, sockets, and pipes for a process. lsof -i :80 to find what is listening on port 80.

dmesg

Kernel ring buffer messages. Shows hardware detection, driver errors, OOM kills. Use dmesg -T for human-readable timestamps.

The /proc Filesystem

/proc is a virtual filesystem that exposes kernel and per-process data. Every running process has a directory at /proc/[pid]/.

# Key /proc entries
/proc/[pid]/status   # Process state, memory, UIDs
/proc/[pid]/fd/      # Open file descriptors (symlinks)
/proc/[pid]/maps     # Memory mappings
/proc/[pid]/cmdline  # Full command line
/proc/cpuinfo        # CPU details
/proc/meminfo        # Memory statistics
/proc/loadavg        # 1/5/15 min load averages

The /sys Filesystem

/sys (sysfs) exposes the kernel's device model. Unlike /proc, it is structured hierarchically by bus, device, and driver. It is the primary interface for device configuration and hardware info.

Caution
Writing to /proc/sys/ or /sys/ files takes effect immediately with no confirmation. A typo in /proc/sys/vm/overcommit_memory can cause OOM kills system-wide. Always validate values before writing.
04 / Networking

Network Configuration & Debugging

Modern Linux networking uses the ip command (replacing ifconfig), ss (replacing netstat), and nftables (replacing iptables).

ToolPurposeCommon Usage
ipInterface, address, route managementip addr show, ip route
ssSocket statisticsss -tulnp (listening TCP/UDP with PIDs)
iptablesLegacy packet filter (netfilter)iptables -L -n -v
nftablesModern packet filter (replaces iptables)nft list ruleset
tcTraffic control / QoStc qdisc show
tcpdumpPacket capturetcpdump -i eth0 port 443 -w capture.pcap
digDNS lookupdig +short example.com A

iptables vs nftables

iptables uses separate binaries for IPv4, IPv6, ARP, and bridging (iptables, ip6tables, arptables, ebtables). nftables unifies all of these behind a single nft command with a cleaner rule syntax and better performance through set-based matching.

# iptables: block incoming traffic on port 8080
iptables -A INPUT -p tcp --dport 8080 -j DROP

# nftables equivalent
nft add rule inet filter input tcp dport 8080 drop

# Traffic shaping: add 100ms latency (testing)
tc qdisc add dev eth0 root netem delay 100ms

# DNS trace
dig +trace example.com
Key Insight
ss -tulnp is the single most useful networking command. It shows all listening sockets with the PID and program name. Learn it by heart: tcp, udp, listening, numeric, process.
05 / Permissions & Security

File Permissions, ACLs & Security Frameworks

Traditional Unix Permissions

Every file has three permission sets: user (owner), group, and other. Each set can have read (4), write (2), and execute (1). The octal notation 755 means rwxr-xr-x.

Special Bits

BitOctalEffect on FilesEffect on Directories
setuid4000Execute as file owner (e.g., passwd)No effect
setgid2000Execute as file groupNew files inherit directory group
sticky1000No effectOnly owner can delete files (e.g., /tmp)

POSIX ACLs

ACLs extend the basic user/group/other model with per-user and per-group entries. The filesystem must be mounted with ACL support.

# Grant user "deploy" read+write on a file
setfacl -m u:deploy:rw /var/www/config.yml

# View ACLs
getfacl /var/www/config.yml

# Set default ACL on directory (inherited by new files)
setfacl -d -m g:devops:rwx /var/www/

SELinux & AppArmor

SELinux (RHEL/Fedora) uses mandatory access control with type enforcement. Every process and file has a security context label. Policies define which types can interact. AppArmor (Ubuntu/SUSE) uses path-based profiles that are simpler to write but less granular.

# SELinux: check current mode
getenforce

# SELinux: view file contexts
ls -Z /var/www/html/

# SELinux: restore default context
restorecon -Rv /var/www/html/

# AppArmor: check profile status
aa-status

seccomp & Capabilities

seccomp restricts which system calls a process can make. Docker and systemd both use seccomp profiles to sandbox processes. Capabilities split the monolithic root privilege into fine-grained tokens (e.g., CAP_NET_BIND_SERVICE lets a process bind to ports below 1024 without full root).

# View capabilities of a binary
getcap /usr/bin/ping
# /usr/bin/ping cap_net_raw=ep

# Grant a capability
setcap cap_net_bind_service=+ep /usr/local/bin/myserver
Security Warning
setuid binaries are a classic attack surface. Any vulnerability in a setuid-root program grants the attacker root. Prefer capabilities over setuid whenever possible. Audit setuid binaries with find / -perm -4000 -type f.
06 / Performance Tuning

Resource Control & Kernel Tuning

Linux provides multiple layers of knobs for controlling process priority, CPU affinity, memory limits, and kernel-level behavior.

nice / renice

Set process scheduling priority (-20 to 19). Lower value = higher priority. nice -n 10 make -j8 runs a build at reduced priority.

taskset

Pin a process to specific CPU cores (CPU affinity). taskset -c 0,1 ./app restricts to cores 0 and 1. Prevents cache-thrashing from migration.

numactl

Control NUMA memory allocation policy. numactl --cpunodebind=0 --membind=0 ./app keeps processes on a single NUMA node for faster memory access.

ulimit

Per-process resource limits: open files, stack size, core dumps. ulimit -n 65536 raises the file descriptor limit. Persistent via /etc/security/limits.conf.

sysctl Tuning

sysctl reads and writes kernel parameters at runtime via /proc/sys/. Persistent changes go in /etc/sysctl.d/*.conf.

# Common performance-related sysctls
sysctl -w vm.swappiness=10                    # Reduce swap aggressiveness
sysctl -w net.core.somaxconn=65535            # Max listen backlog
sysctl -w net.ipv4.tcp_tw_reuse=1             # Reuse TIME_WAIT sockets
sysctl -w fs.file-max=2097152                 # System-wide file descriptor limit
sysctl -w kernel.pid_max=4194304              # Max PID value

# Apply from config file
sysctl --system
Production Checklist
For high-connection servers: raise somaxconn, file-max, and per-process ulimit -n. Set vm.swappiness=10 (or 0 for databases). Enable tcp_tw_reuse for services making many outbound connections. Always benchmark before and after changes.

Test Yourself

Score: 0 / 10
Question 01
What is the primary purpose of initramfs during the Linux boot process?
The kernel cannot include every possible storage driver. initramfs is a temporary root containing just enough drivers and scripts to locate, decrypt (if needed), and mount the actual root partition.
Question 02
What is the difference between Wants= and Requires= in a systemd unit file?
Wants= is a soft dependency: if the wanted unit fails, the depending unit still starts. Requires= is a hard dependency: if the required unit fails, the depending unit is also stopped or not started.
Question 03
Which tool traces system calls made by a running process?
strace intercepts and records system calls (read, write, open, etc.) made by a process. ltrace traces library calls instead. lsof lists open files. perf does hardware-level profiling.
Question 04
What does ss -tulnp show?
The flags decode as: tcp, udp, listening, numeric (no DNS resolution), process (show PID/program). This is the go-to command for seeing what is listening on which ports.
Question 05
What is the sticky bit's effect on a directory?
The sticky bit (octal 1000) on a directory means that only the file's owner, the directory's owner, or root can delete or rename files within it. /tmp is the classic example: anyone can write, but you can only delete your own files.
Question 06
How does SELinux differ from traditional Unix permissions?
SELinux implements Mandatory Access Control (MAC) by labeling every process and file with a security context. Even if Unix permissions allow access, SELinux policy can deny it. This defense-in-depth approach limits damage from compromised services.
Question 07
What does nice -n 19 ./script.sh do?
Nice values range from -20 (highest priority) to 19 (lowest priority). A nice value of 19 means the process will only get CPU time when no other process needs it. Only root can set negative nice values.
Question 08
What advantage does socket activation provide in systemd?
With socket activation, systemd opens the listening socket immediately but only spawns the actual service process when the first connection arrives. This lets the system boot faster since many services are started on-demand rather than eagerly.
Question 09
Which capability allows a non-root process to bind to ports below 1024?
CAP_NET_BIND_SERVICE specifically grants the ability to bind to privileged ports (below 1024) without full root. This is the preferred approach over setuid or running as root for web servers that need port 80/443.
Question 10
What does sysctl -w vm.swappiness=10 do?
vm.swappiness controls how aggressively the kernel moves pages from physical memory to swap. The default is 60. A value of 10 tells the kernel to strongly prefer keeping pages in RAM and only swap when memory pressure is high. Databases and latency-sensitive workloads benefit from lower swappiness.