Operating Systems

Processes

A process is a program in execution. Understanding how the OS creates, schedules, and manages processes is fundamental to systems programming and performance engineering.

01 / What Is a Process

Anatomy of a Process

A process is more than just running code. The kernel tracks each process through a Process Control Block (PCB), a data structure (called task_struct in Linux, ~6 KB) containing everything the OS needs: PID, register state, memory maps, open file descriptors, signal handlers, scheduling priority, and more.

Every process exists in one of several well-defined states. The kernel transitions processes between these states in response to system calls, interrupts, and scheduling decisions.

Process State Diagram
Created
Ready
Running
Terminated
Waiting (Blocked)
Running → Waiting: process requests I/O or waits on lock
Waiting → Ready: I/O completes or resource becomes available
Running → Ready: preempted by scheduler (time slice expired)
Key Insight
A process in the Ready state has everything it needs to run but is waiting for CPU time. A process in the Waiting state is blocked on an external event (disk I/O, network, mutex). The scheduler only picks from the ready queue.

What the PCB Holds

FieldPurpose
PID / PPIDUnique identifier and parent's PID
Register stateSaved CPU registers for context switching
Memory mapsPage table pointer, virtual address space layout
File descriptorsTable of open files, sockets, pipes
Signal handlersRegistered handlers for each signal type
Scheduling infoPriority, nice value, CPU time consumed
CredentialsUID, GID, capabilities
02 / Process Creation

Fork, Exec, and the Process Tree

On Unix, new processes are created via the fork() + exec() pattern. fork() creates a child process that is a near-exact copy of the parent. exec() replaces the child's address space with a new program.

pid_t pid = fork();          // create child
if (pid == 0) {
    // child process
    execvp("ls", args);      // replace with new program
} else {
    // parent process
    waitpid(pid, &status, 0); // wait for child
}

Copy-on-Write (CoW)

fork() does not actually copy all memory. Instead, parent and child share the same physical pages marked as read-only. Only when either process writes to a page does the kernel copy it. This makes fork() extremely fast, often completing in under 1 ms even for processes with GBs of memory.

PID 1 and the Process Tree

Every process has a parent (PPID). The first userspace process is init (or systemd) with PID 1. All other processes descend from it, forming a tree visible via pstree.

Orphans and Zombies
Orphan: A child whose parent exits. The kernel re-parents it to PID 1, which calls wait() to reap it.
Zombie: A child that has exited but whose parent hasn't called wait(). It occupies a slot in the process table (no memory, but a PID). Too many zombies can exhaust the PID space. Fix: the parent must call wait() or handle SIGCHLD.
03 / Memory Layout

Process Memory Layout

Each process gets its own virtual address space. The kernel maps virtual addresses to physical frames via page tables. The layout follows a standard convention on modern systems.

Virtual Address Space (high to low)
↑ High addresses (0xFFFF...)
Kernel Space (inaccessible to user)
Stack (grows downward)
↓ grows down
... unmapped gap ...
mmap / Shared Libraries
... unmapped gap ...
Heap (grows upward)
↑ grows up
BSS (uninitialized globals, zeroed)
Data (initialized globals)
Text (executable code, read-only)
↓ Low addresses (0x0000...)

Key Segments

SegmentContentsGrowth
textMachine code instructionsFixed, read-only
dataInitialized global/static variablesFixed
BSSUninitialized globals (zeroed by kernel)Fixed
heapmalloc()/brk() allocationsGrows upward
mmapShared libraries, memory-mapped filesMiddle region
stackLocal variables, return addresses, framesGrows downward
ASLR - Address Space Layout Randomization
Modern kernels randomize the base addresses of the stack, heap, mmap region, and shared libraries on each execution. This makes buffer overflow exploits significantly harder since attackers cannot predict where code or data lives. Check with cat /proc/self/maps.
04 / Scheduling

CPU Scheduling

The scheduler decides which ready process gets the CPU next. This decision profoundly affects system responsiveness and throughput.

Preemptive vs Cooperative

AspectPreemptiveCooperative
Who decidesOS forces context switch via timer interruptProcess voluntarily yields
FairnessGuaranteed, no starvationOne misbehaving process can hog CPU
ComplexityNeeds careful synchronizationSimpler, fewer race conditions
Used inAll modern OS kernelsEvent loops (Node.js), coroutines, early Mac OS

Linux CFS (Completely Fair Scheduler)

Linux uses CFS as the default scheduler. Instead of fixed time slices, CFS tracks vruntime (virtual runtime) for each process. The process with the lowest vruntime gets the CPU next. CFS uses a red-black tree to find this process in O(log n).

CFS aims for perfect fairness: if two processes have equal priority, each should get exactly 50% of CPU time. In practice, it achieves near-perfect fairness with sub-millisecond granularity.

Nice Values
Nice values range from -20 (highest priority) to +19 (lowest). A process with nice -20 gets roughly 10x more CPU time than one with nice +19. Only root can set negative nice values. Use nice -n 10 ./my_program or renice -n 5 -p PID.

CPU-bound vs I/O-bound

CPU-bound

Spends most time computing (video encoding, ML training). Benefits from longer time slices. Throughput-sensitive.

I/O-bound

Spends most time waiting on I/O (web servers, databases). Needs low latency to resume quickly. CFS naturally boosts these because their vruntime stays low.

05 / Context Switching

Context Switching

When the scheduler decides to run a different process, the kernel performs a context switch: saving the current process's state and loading the next one's. This is pure overhead — no useful work happens during a context switch.

What Gets Saved

CPU Registers

General-purpose registers, program counter, stack pointer, flags register. Saved to the PCB's kernel stack.

Page Table Pointer

CR3 register on x86. Points to the process's page table. Changing it switches the virtual address space.

FPU / SIMD State

Floating point and vector registers (SSE, AVX). Can be lazily saved/restored to reduce overhead.

The Cost

A context switch itself takes 1-10 microseconds of direct CPU time. But the indirect cost is much larger: the TLB (Translation Lookaside Buffer) gets flushed, and CPU caches go cold. This can cause thousands of cache misses as the new process warms up, adding 10-100 microseconds of effective cost.

TLB Flush
Switching page tables invalidates TLB entries because virtual-to-physical mappings differ between processes. Modern CPUs use PCID (Process Context ID) to tag TLB entries and avoid full flushes, reducing context switch cost by 20-40%. Check for PCID support: grep pcid /proc/cpuinfo.
06 / Inter-Process Communication

IPC Mechanisms

Processes have isolated address spaces, so they need explicit mechanisms to communicate. The kernel provides several IPC primitives, each with different tradeoffs.

Pipes

Unidirectional byte stream between related processes. Created with pipe(). Shell pipes (|) use this. Named pipes (FIFOs) work between unrelated processes.

Message Queues

Kernel-managed queue of typed messages. Processes send/receive discrete messages. POSIX (mq_open) or System V (msgget) variants.

Shared Memory

Fastest IPC: multiple processes map the same physical pages. No kernel involvement for reads/writes. Needs synchronization (semaphores/mutexes). shmget() or mmap() with MAP_SHARED.

Semaphores

Synchronization primitive, not data transfer. Controls access to shared resources. sem_wait() decrements, sem_post() increments. Often used alongside shared memory.

Signals

Asynchronous notifications: SIGTERM, SIGKILL, SIGCHLD, SIGUSR1, etc. Limited bandwidth (just a number), but useful for control flow. Cannot be ignored: SIGKILL, SIGSTOP.

Unix Domain Sockets

Full-duplex, connection-oriented IPC via filesystem path. Same API as network sockets but ~2x faster (no TCP/IP stack). Used by Docker, PostgreSQL, systemd.

IPC Performance Comparison

MechanismThroughputLatencyBest For
Shared MemoryHighest (memcpy speed)LowestHigh-throughput data sharing
Unix Domain SocketHigh (~6 GB/s)Low (~2 us)Client-server on same host
PipeModerate (~3 GB/s)Low (~3 us)Parent-child streaming
Message QueueModerateModerateDecoupled message passing
SignalMinimal (just signal number)LowProcess control, notifications

Test Yourself

Score: 0 / 8
Question 01
What data structure does the Linux kernel use to represent a process?
Linux represents each process (and thread) with a task_struct. The sched_entity is a sub-structure within it used specifically by the scheduler.
Question 02
What is a zombie process?
A zombie is a process that has finished execution but still has an entry in the process table because its parent hasn't called wait() to read its exit status. Option A describes an orphan process.
Question 03
How does fork() avoid copying the entire address space of the parent?
Copy-on-Write (CoW) marks all shared pages as read-only. When either process writes to a page, a page fault triggers the kernel to copy just that page. This makes fork() nearly instantaneous regardless of process memory size.
Question 04
In the process memory layout, which segment grows downward toward lower addresses?
The stack grows downward from high addresses. The heap grows upward from low addresses. They grow toward each other, with a large unmapped gap between them.
Question 05
What data structure does Linux CFS use to find the process with the lowest vruntime?
CFS stores all runnable processes in a red-black tree keyed by vruntime. The leftmost node (minimum vruntime) is the next process to run. Insertion and lookup are both O(log n).
Question 06
What is the primary indirect cost of a context switch beyond the register save/restore?
When the page table pointer changes (CR3 on x86), TLB entries become invalid and CPU caches go cold. The new process suffers many cache misses as it warms up. This indirect cost (10-100 us) dwarfs the direct register save/restore (1-10 us).
Question 07
Which IPC mechanism has the highest throughput because it avoids kernel involvement for data transfer?
Shared memory maps the same physical pages into multiple processes' address spaces. After setup, reads and writes are just memory operations with no system calls. The tradeoff: you must handle synchronization yourself.
Question 08
What does ASLR protect against?
Address Space Layout Randomization randomizes the positions of the stack, heap, mmap region, and libraries on each execution. An attacker cannot hardcode addresses for return-oriented programming (ROP) or shellcode injection.