Processes — learn.surkar.in

01 / What Is a Process

Anatomy of a Process

A process is more than just running code. The kernel tracks each process through a Process Control Block (PCB), a data structure (called task_struct in Linux, ~6 KB) containing everything the OS needs: PID, register state, memory maps, open file descriptors, signal handlers, scheduling priority, and more.

Every process exists in one of several well-defined states. The kernel transitions processes between these states in response to system calls, interrupts, and scheduling decisions.

Process State Diagram

Created

→

Ready

→

Running

→

Terminated

Waiting (Blocked)

Running → Waiting: process requests I/O or waits on lock
Waiting → Ready: I/O completes or resource becomes available
Running → Ready: preempted by scheduler (time slice expired)

Key Insight

A process in the Ready state has everything it needs to run but is waiting for CPU time. A process in the Waiting state is blocked on an external event (disk I/O, network, mutex). The scheduler only picks from the ready queue.

What the PCB Holds

Field	Purpose
PID / PPID	Unique identifier and parent's PID
Register state	Saved CPU registers for context switching
Memory maps	Page table pointer, virtual address space layout
File descriptors	Table of open files, sockets, pipes
Signal handlers	Registered handlers for each signal type
Scheduling info	Priority, nice value, CPU time consumed
Credentials	UID, GID, capabilities

02 / Process Creation

Fork, Exec, and the Process Tree

On Unix, new processes are created via the fork() + exec() pattern. fork() creates a child process that is a near-exact copy of the parent. exec() replaces the child's address space with a new program.

pid_t pid = fork();          // create child
if (pid == 0) {
    // child process
    execvp("ls", args);      // replace with new program
} else {
    // parent process
    waitpid(pid, &status, 0); // wait for child
}

Copy-on-Write (CoW)

fork() does not actually copy all memory. Instead, parent and child share the same physical pages marked as read-only. Only when either process writes to a page does the kernel copy it. This makes fork() extremely fast, often completing in under 1 ms even for processes with GBs of memory.

PID 1 and the Process Tree

Every process has a parent (PPID). The first userspace process is init (or systemd) with PID 1. All other processes descend from it, forming a tree visible via pstree.

Orphans and Zombies

Orphan: A child whose parent exits. The kernel re-parents it to PID 1, which calls wait() to reap it.
Zombie: A child that has exited but whose parent hasn't called wait(). It occupies a slot in the process table (no memory, but a PID). Too many zombies can exhaust the PID space. Fix: the parent must call wait() or handle SIGCHLD.

03 / Memory Layout

Process Memory Layout

Each process gets its own virtual address space. The kernel maps virtual addresses to physical frames via page tables. The layout follows a standard convention on modern systems.

Virtual Address Space (high to low)

↑ High addresses (0xFFFF...)

Kernel Space (inaccessible to user)

Stack (grows downward)

↓ grows down

... unmapped gap ...

mmap / Shared Libraries

... unmapped gap ...

Heap (grows upward)

↑ grows up

BSS (uninitialized globals, zeroed)

Data (initialized globals)

Text (executable code, read-only)

↓ Low addresses (0x0000...)

Key Segments

Segment	Contents	Growth
`text`	Machine code instructions	Fixed, read-only
`data`	Initialized global/static variables	Fixed
`BSS`	Uninitialized globals (zeroed by kernel)	Fixed
`heap`	`malloc()`/`brk()` allocations	Grows upward
`mmap`	Shared libraries, memory-mapped files	Middle region
`stack`	Local variables, return addresses, frames	Grows downward

ASLR - Address Space Layout Randomization

Modern kernels randomize the base addresses of the stack, heap, mmap region, and shared libraries on each execution. This makes buffer overflow exploits significantly harder since attackers cannot predict where code or data lives. Check with cat /proc/self/maps.

04 / Scheduling

CPU Scheduling

The scheduler decides which ready process gets the CPU next. This decision profoundly affects system responsiveness and throughput.

Preemptive vs Cooperative

Aspect	Preemptive	Cooperative
Who decides	OS forces context switch via timer interrupt	Process voluntarily yields
Fairness	Guaranteed, no starvation	One misbehaving process can hog CPU
Complexity	Needs careful synchronization	Simpler, fewer race conditions
Used in	All modern OS kernels	Event loops (Node.js), coroutines, early Mac OS

Linux CFS (Completely Fair Scheduler)

Linux uses CFS as the default scheduler. Instead of fixed time slices, CFS tracks vruntime (virtual runtime) for each process. The process with the lowest vruntime gets the CPU next. CFS uses a red-black tree to find this process in O(log n).

CFS aims for perfect fairness: if two processes have equal priority, each should get exactly 50% of CPU time. In practice, it achieves near-perfect fairness with sub-millisecond granularity.

Nice Values

Nice values range from -20 (highest priority) to +19 (lowest). A process with nice -20 gets roughly 10x more CPU time than one with nice +19. Only root can set negative nice values. Use nice -n 10 ./my_program or renice -n 5 -p PID.

CPU-bound vs I/O-bound

CPU-bound

Spends most time computing (video encoding, ML training). Benefits from longer time slices. Throughput-sensitive.

I/O-bound

Spends most time waiting on I/O (web servers, databases). Needs low latency to resume quickly. CFS naturally boosts these because their vruntime stays low.

05 / Context Switching

Context Switching

When the scheduler decides to run a different process, the kernel performs a context switch: saving the current process's state and loading the next one's. This is pure overhead — no useful work happens during a context switch.

What Gets Saved

CPU Registers

General-purpose registers, program counter, stack pointer, flags register. Saved to the PCB's kernel stack.

Page Table Pointer

CR3 register on x86. Points to the process's page table. Changing it switches the virtual address space.

FPU / SIMD State

Floating point and vector registers (SSE, AVX). Can be lazily saved/restored to reduce overhead.

The Cost

A context switch itself takes 1-10 microseconds of direct CPU time. But the indirect cost is much larger: the TLB (Translation Lookaside Buffer) gets flushed, and CPU caches go cold. This can cause thousands of cache misses as the new process warms up, adding 10-100 microseconds of effective cost.

TLB Flush

Switching page tables invalidates TLB entries because virtual-to-physical mappings differ between processes. Modern CPUs use PCID (Process Context ID) to tag TLB entries and avoid full flushes, reducing context switch cost by 20-40%. Check for PCID support: grep pcid /proc/cpuinfo.

06 / Inter-Process Communication

IPC Mechanisms

Processes have isolated address spaces, so they need explicit mechanisms to communicate. The kernel provides several IPC primitives, each with different tradeoffs.

Pipes

Unidirectional byte stream between related processes. Created with pipe(). Shell pipes (|) use this. Named pipes (FIFOs) work between unrelated processes.

Message Queues

Kernel-managed queue of typed messages. Processes send/receive discrete messages. POSIX (mq_open) or System V (msgget) variants.

Shared Memory

Fastest IPC: multiple processes map the same physical pages. No kernel involvement for reads/writes. Needs synchronization (semaphores/mutexes). shmget() or mmap() with MAP_SHARED.

Semaphores

Synchronization primitive, not data transfer. Controls access to shared resources. sem_wait() decrements, sem_post() increments. Often used alongside shared memory.

Signals

Asynchronous notifications: SIGTERM, SIGKILL, SIGCHLD, SIGUSR1, etc. Limited bandwidth (just a number), but useful for control flow. Cannot be ignored: SIGKILL, SIGSTOP.

Unix Domain Sockets

Full-duplex, connection-oriented IPC via filesystem path. Same API as network sockets but ~2x faster (no TCP/IP stack). Used by Docker, PostgreSQL, systemd.

IPC Performance Comparison

Mechanism	Throughput	Latency	Best For
Shared Memory	Highest (memcpy speed)	Lowest	High-throughput data sharing
Unix Domain Socket	High (~6 GB/s)	Low (~2 us)	Client-server on same host
Pipe	Moderate (~3 GB/s)	Low (~3 us)	Parent-child streaming
Message Queue	Moderate	Moderate	Decoupled message passing
Signal	Minimal (just signal number)	Low	Process control, notifications

Test Yourself

Score: 0 / 8

Question 01

What data structure does the Linux kernel use to represent a process?

Linux represents each process (and thread) with a task_struct. The sched_entity is a sub-structure within it used specifically by the scheduler.

Question 02

What is a zombie process?

A zombie is a process that has finished execution but still has an entry in the process table because its parent hasn't called wait() to read its exit status. Option A describes an orphan process.

Question 03

How does fork() avoid copying the entire address space of the parent?

Copy-on-Write (CoW) marks all shared pages as read-only. When either process writes to a page, a page fault triggers the kernel to copy just that page. This makes fork() nearly instantaneous regardless of process memory size.

Question 04

In the process memory layout, which segment grows downward toward lower addresses?

The stack grows downward from high addresses. The heap grows upward from low addresses. They grow toward each other, with a large unmapped gap between them.

Question 05

What data structure does Linux CFS use to find the process with the lowest vruntime?

CFS stores all runnable processes in a red-black tree keyed by vruntime. The leftmost node (minimum vruntime) is the next process to run. Insertion and lookup are both O(log n).

Question 06

What is the primary indirect cost of a context switch beyond the register save/restore?

When the page table pointer changes (CR3 on x86), TLB entries become invalid and CPU caches go cold. The new process suffers many cache misses as it warms up. This indirect cost (10-100 us) dwarfs the direct register save/restore (1-10 us).

Question 07

Which IPC mechanism has the highest throughput because it avoids kernel involvement for data transfer?

Shared memory maps the same physical pages into multiple processes' address spaces. After setup, reads and writes are just memory operations with no system calls. The tradeoff: you must handle synchronization yourself.

Question 08

What does ASLR protect against?

Address Space Layout Randomization randomizes the positions of the stack, heap, mmap region, and libraries on each execution. An attacker cannot hardcode addresses for return-oriented programming (ROP) or shellcode injection.