Trees (Data Structures) — learn.surkar.in

01 / Fundamentals

Tree Fundamentals

A tree is a connected, acyclic graph with a distinguished root node. Every node except the root has exactly one parent, and zero or more children. A node with no children is called a leaf.

Tree Terminology

        [A]           <-- root (depth 0)
       /   \
     [B]   [C]        <-- depth 1
    / \       \
  [D] [E]    [F]      <-- depth 2 (D, E, F are leaves)

Term	Definition
Root	The topmost node with no parent.
Parent / Child	A node directly above/below another in the hierarchy.
Leaf	A node with zero children.
Depth	Number of edges from the root to a given node.
Height	Number of edges on the longest path from a node down to a leaf. Height of the tree = height of root.
Degree	Number of children a node has.
Subtree	A node and all its descendants.

Key Insight

A tree with n nodes always has exactly n - 1 edges. This invariant is useful for quick sanity checks.

02 / Binary Search Trees

BST: Search, Insert, Delete

A Binary Search Tree enforces the invariant: for every node, all values in the left subtree are smaller and all values in the right subtree are larger. This gives O(log n) average-case search, insert, and delete.

BST Property

        [8]
       /   \
     [3]   [10]
    / \       \
  [1] [6]    [14]
      / \    /
    [4] [7] [13]

  In-order traversal: 1, 3, 4, 6, 7, 8, 10, 13, 14  (sorted!)

Operation	Average	Worst (skewed)
Search	O(log n)	O(n)
Insert	O(log n)	O(n)
Delete	O(log n)	O(n)

Worst Case

If you insert sorted data into a plain BST, it degenerates into a linked list with O(n) operations. This is why self-balancing trees exist.

In-Order = Sorted

An in-order traversal (left, root, right) of a BST always yields elements in sorted order. This makes BSTs natural for ordered iteration and finding successors/predecessors.

03 / Self-Balancing Trees

AVL Trees & Red-Black Trees

Self-balancing BSTs automatically maintain a bounded height after insertions and deletions, guaranteeing O(log n) worst-case operations.

AVL Trees

Strict balance: for every node, the heights of left and right subtrees differ by at most 1. Rebalancing uses rotations (single or double) after each modification. Lookups are slightly faster than Red-Black trees because the tree is more tightly balanced.

Red-Black Trees

Relaxed balance: nodes are colored red or black with rules that ensure the longest path is at most 2x the shortest. Fewer rotations on insert/delete compared to AVL, making writes faster. Used extensively in practice.

Property	AVL	Red-Black
Balance strictness	Height diff ≤ 1	Longest path ≤ 2x shortest
Max height	~1.44 log n	~2 log n
Lookup speed	Slightly faster	Slightly slower
Insert/Delete cost	More rotations	Fewer rotations
Typical use	Read-heavy workloads	General purpose

Real-World Usage

Red-Black trees power Java's TreeMap / TreeSet, C++ std::map, and the Linux kernel's CFS (Completely Fair Scheduler) which uses an RB-tree to schedule processes by virtual runtime.

AVL Rotation (Right Rotation)

  Before:        After:
    [C]           [B]
   /             /   \
  [B]          [A]   [C]
 /
[A]

04 / B-Trees & B+ Trees

Multi-Way Trees for Disk

B-Trees generalize BSTs by allowing each node to hold multiple keys and have multiple children. They are optimized for systems that read/write large blocks of data — particularly databases and file systems.

B-Tree

A B-Tree of order m allows each node up to m - 1 keys and m children. All leaves are at the same depth, guaranteeing O(log n) operations. The wide branching factor minimizes disk I/O by reducing tree height.

B+ Tree

A variant where all actual data resides in the leaves, and internal nodes store only keys for routing. Leaf nodes are linked together in a doubly-linked list, enabling efficient range scans.

B+ Tree Structure

Internal: [10 | 20]

Leaf: [1,3,7]

↔

Leaf: [10,12,15]

↔

Leaf: [20,25,30]

Property	B-Tree	B+ Tree
Data location	All nodes	Leaves only
Leaf linking	No	Yes (linked list)
Range queries	Slower (tree traversal)	Fast (scan linked leaves)
Used in	MongoDB (WiredTiger)	MySQL InnoDB, PostgreSQL, NTFS, ext4

Why B+ Trees Dominate Databases

Linked leaves make range queries (SELECT ... WHERE x BETWEEN a AND b) sequential reads instead of random seeks. Internal nodes holding only keys means higher fanout and fewer disk pages to traverse.

05 / Heaps

Heaps & Priority Queues

A heap is a complete binary tree satisfying the heap property: in a min-heap, every parent is smaller than its children; in a max-heap, every parent is larger. Heaps are typically stored as arrays with no pointers needed.

Min-Heap (Array: [1, 3, 5, 7, 9, 8])

        [1]
       /   \
     [3]   [5]
    / \    /
  [7] [9] [8]

  Parent of i: (i-1)/2
  Left child:  2i + 1
  Right child: 2i + 2

Operation	Time	Notes
Insert	O(log n)	Add at end, bubble up
Extract min/max	O(log n)	Swap root with last, sift down
Peek min/max	O(1)	Always at index 0
Build heap	O(n)	Bottom-up heapify (Floyd's)

Applications

Priority Queue

The standard implementation for priority queues. Used in Dijkstra's algorithm, task scheduling, and event-driven simulation.

Heapsort

O(n log n) in-place sorting. Build a max-heap, then repeatedly extract the max. Not stable, but guaranteed worst-case performance.

Top-K Elements

Maintain a min-heap of size K while streaming data. Each new element is compared to the heap root — O(n log K) total.

Array Implementation

Because heaps are complete binary trees, they map perfectly to arrays with zero wasted space. No child/parent pointers needed — just arithmetic on indices.

06 / Tries, Segment Trees & Choosing the Right Tree

Specialized Trees

Trie (Prefix Tree)

A trie stores strings character-by-character, with each edge representing a character. Lookup time is O(L) where L is the length of the query string — independent of how many strings are stored.

Trie storing: "cat", "car", "card", "dog"

       (root)
      /      \
    c          d
    |          |
    a          o
   / \         |
  t*  r*       g*
      |
      d*

  * = end of word

Autocomplete

Walk the trie to the prefix node, then enumerate all descendants. Powers search suggestions in IDEs and search engines.

IP Routing (Patricia Trie)

A compressed trie (radix tree) that stores IP prefixes. Routers use it for longest-prefix matching in O(W) time, where W = address width.

Segment Tree

A segment tree builds on an array to answer range queries (sum, min, max, GCD) and perform point/range updates in O(log n). Each node stores the aggregate for a range of the original array.

Fenwick Tree (Binary Indexed Tree)

A more space-efficient alternative to segment trees for prefix sums and point updates. Uses clever bit manipulation — O(log n) for both query and update, but limited to operations with an inverse (like addition).

Property	Segment Tree	Fenwick Tree
Range query	O(log n)	O(log n)
Point update	O(log n)	O(log n)
Range update	O(log n) with lazy propagation	Limited
Space	4n	n
Complexity to implement	Moderate	Simple

When to Use What

Need	Use	Why
Sorted data + ordered iteration	BST (balanced)	In-order traversal, predecessor/successor queries
Disk-backed sorted index	B+ Tree	Minimizes disk I/O, linked leaves for range scans
Fast min/max extraction	Heap	O(1) peek, O(log n) extract, array-backed
Prefix search / autocomplete	Trie	O(L) lookup independent of dataset size
Key-value with O(1) lookup	Hash Table	Fastest average lookup, but no ordering
Range aggregate queries	Segment / Fenwick Tree	O(log n) query and update on array ranges

BST vs Hash Table

Hash tables give O(1) average lookup but cannot do range queries or ordered traversal. If you need ordering, use a balanced BST. If you only need key lookup, use a hash table.

Test Yourself

Score: 0 / 10

Question 01

What is the worst-case time complexity of search in an unbalanced BST?

An unbalanced BST can degenerate into a linked list (e.g., inserting sorted data), making search O(n).

Question 02

Which traversal of a BST produces elements in sorted order?

In-order traversal visits left subtree, then root, then right subtree — which for a BST yields keys in ascending order.

Question 03

Which data structure does the Linux CFS (Completely Fair Scheduler) use?

The Linux CFS uses a Red-Black tree keyed by virtual runtime to pick the next process to schedule in O(log n).

Question 04

In a B+ Tree, where is the actual data stored?

B+ Trees store data exclusively in leaf nodes. Internal nodes only contain keys used for routing searches to the correct leaf.

Question 05

What is the time complexity of building a heap from an unsorted array using Floyd's algorithm?

Floyd's bottom-up heapify runs in O(n). Although each sift-down is O(log n), most nodes are near the bottom and require very few swaps, making the total linear.

Question 06

What is the lookup time complexity of a trie for a string of length L?

A trie traverses one node per character of the query string, so lookup is O(L) — completely independent of the number of strings stored.

Question 07

To efficiently find the top-K largest elements from a stream of N numbers, which approach is optimal?

A min-heap of size K keeps the K largest elements seen so far. Each new element is compared to the root (smallest of the K) — if larger, replace and sift down. Total: O(N log K) time, O(K) space.

Question 08

Which tree has stricter balancing — AVL or Red-Black?

AVL trees require the heights of left and right subtrees to differ by at most 1, making them more strictly balanced than Red-Black trees (max height ~1.44 log n vs ~2 log n).

Question 09

Why do B+ Trees link their leaf nodes together?

Linked leaves let range queries (e.g., WHERE x BETWEEN 10 AND 100) scan sequentially through leaf pages instead of repeatedly traversing the tree from the root.

Question 10

A tree with 15 nodes has how many edges?

Every tree with n nodes has exactly n - 1 edges. This is a fundamental property — each node except the root has exactly one edge connecting it to its parent.