Data Structures

Trees in depth

From binary search trees to tries and segment trees — the hierarchical structures that power databases, compilers, routers, and schedulers.

01 / Fundamentals

Tree Fundamentals

A tree is a connected, acyclic graph with a distinguished root node. Every node except the root has exactly one parent, and zero or more children. A node with no children is called a leaf.

Tree Terminology
        [A]           <-- root (depth 0)
       /   \
     [B]   [C]        <-- depth 1
    / \       \
  [D] [E]    [F]      <-- depth 2 (D, E, F are leaves)
TermDefinition
RootThe topmost node with no parent.
Parent / ChildA node directly above/below another in the hierarchy.
LeafA node with zero children.
DepthNumber of edges from the root to a given node.
HeightNumber of edges on the longest path from a node down to a leaf. Height of the tree = height of root.
DegreeNumber of children a node has.
SubtreeA node and all its descendants.
Key Insight
A tree with n nodes always has exactly n - 1 edges. This invariant is useful for quick sanity checks.
02 / Binary Search Trees

BST: Search, Insert, Delete

A Binary Search Tree enforces the invariant: for every node, all values in the left subtree are smaller and all values in the right subtree are larger. This gives O(log n) average-case search, insert, and delete.

BST Property
        [8]
       /   \
     [3]   [10]
    / \       \
  [1] [6]    [14]
      / \    /
    [4] [7] [13]

  In-order traversal: 1, 3, 4, 6, 7, 8, 10, 13, 14  (sorted!)
OperationAverageWorst (skewed)
SearchO(log n)O(n)
InsertO(log n)O(n)
DeleteO(log n)O(n)
Worst Case
If you insert sorted data into a plain BST, it degenerates into a linked list with O(n) operations. This is why self-balancing trees exist.
In-Order = Sorted
An in-order traversal (left, root, right) of a BST always yields elements in sorted order. This makes BSTs natural for ordered iteration and finding successors/predecessors.
03 / Self-Balancing Trees

AVL Trees & Red-Black Trees

Self-balancing BSTs automatically maintain a bounded height after insertions and deletions, guaranteeing O(log n) worst-case operations.

AVL Trees

Strict balance: for every node, the heights of left and right subtrees differ by at most 1. Rebalancing uses rotations (single or double) after each modification. Lookups are slightly faster than Red-Black trees because the tree is more tightly balanced.

Red-Black Trees

Relaxed balance: nodes are colored red or black with rules that ensure the longest path is at most 2x the shortest. Fewer rotations on insert/delete compared to AVL, making writes faster. Used extensively in practice.

PropertyAVLRed-Black
Balance strictnessHeight diff ≤ 1Longest path ≤ 2x shortest
Max height~1.44 log n~2 log n
Lookup speedSlightly fasterSlightly slower
Insert/Delete costMore rotationsFewer rotations
Typical useRead-heavy workloadsGeneral purpose
Real-World Usage
Red-Black trees power Java's TreeMap / TreeSet, C++ std::map, and the Linux kernel's CFS (Completely Fair Scheduler) which uses an RB-tree to schedule processes by virtual runtime.
AVL Rotation (Right Rotation)
  Before:        After:
    [C]           [B]
   /             /   \
  [B]          [A]   [C]
 /
[A]
04 / B-Trees & B+ Trees

Multi-Way Trees for Disk

B-Trees generalize BSTs by allowing each node to hold multiple keys and have multiple children. They are optimized for systems that read/write large blocks of data — particularly databases and file systems.

B-Tree

A B-Tree of order m allows each node up to m - 1 keys and m children. All leaves are at the same depth, guaranteeing O(log n) operations. The wide branching factor minimizes disk I/O by reducing tree height.

B+ Tree

A variant where all actual data resides in the leaves, and internal nodes store only keys for routing. Leaf nodes are linked together in a doubly-linked list, enabling efficient range scans.

B+ Tree Structure
Internal: [10 | 20]
Leaf: [1,3,7]
Leaf: [10,12,15]
Leaf: [20,25,30]
PropertyB-TreeB+ Tree
Data locationAll nodesLeaves only
Leaf linkingNoYes (linked list)
Range queriesSlower (tree traversal)Fast (scan linked leaves)
Used inMongoDB (WiredTiger)MySQL InnoDB, PostgreSQL, NTFS, ext4
Why B+ Trees Dominate Databases
Linked leaves make range queries (SELECT ... WHERE x BETWEEN a AND b) sequential reads instead of random seeks. Internal nodes holding only keys means higher fanout and fewer disk pages to traverse.
05 / Heaps

Heaps & Priority Queues

A heap is a complete binary tree satisfying the heap property: in a min-heap, every parent is smaller than its children; in a max-heap, every parent is larger. Heaps are typically stored as arrays with no pointers needed.

Min-Heap (Array: [1, 3, 5, 7, 9, 8])
        [1]
       /   \
     [3]   [5]
    / \    /
  [7] [9] [8]

  Parent of i: (i-1)/2
  Left child:  2i + 1
  Right child: 2i + 2
OperationTimeNotes
InsertO(log n)Add at end, bubble up
Extract min/maxO(log n)Swap root with last, sift down
Peek min/maxO(1)Always at index 0
Build heapO(n)Bottom-up heapify (Floyd's)

Applications

Priority Queue

The standard implementation for priority queues. Used in Dijkstra's algorithm, task scheduling, and event-driven simulation.

Heapsort

O(n log n) in-place sorting. Build a max-heap, then repeatedly extract the max. Not stable, but guaranteed worst-case performance.

Top-K Elements

Maintain a min-heap of size K while streaming data. Each new element is compared to the heap root — O(n log K) total.

Array Implementation
Because heaps are complete binary trees, they map perfectly to arrays with zero wasted space. No child/parent pointers needed — just arithmetic on indices.
06 / Tries, Segment Trees & Choosing the Right Tree

Specialized Trees

Trie (Prefix Tree)

A trie stores strings character-by-character, with each edge representing a character. Lookup time is O(L) where L is the length of the query string — independent of how many strings are stored.

Trie storing: "cat", "car", "card", "dog"
       (root)
      /      \
    c          d
    |          |
    a          o
   / \         |
  t*  r*       g*
      |
      d*

  * = end of word
Autocomplete

Walk the trie to the prefix node, then enumerate all descendants. Powers search suggestions in IDEs and search engines.

IP Routing (Patricia Trie)

A compressed trie (radix tree) that stores IP prefixes. Routers use it for longest-prefix matching in O(W) time, where W = address width.

Segment Tree

A segment tree builds on an array to answer range queries (sum, min, max, GCD) and perform point/range updates in O(log n). Each node stores the aggregate for a range of the original array.

Fenwick Tree (Binary Indexed Tree)

A more space-efficient alternative to segment trees for prefix sums and point updates. Uses clever bit manipulation — O(log n) for both query and update, but limited to operations with an inverse (like addition).

PropertySegment TreeFenwick Tree
Range queryO(log n)O(log n)
Point updateO(log n)O(log n)
Range updateO(log n) with lazy propagationLimited
Space4nn
Complexity to implementModerateSimple

When to Use What

NeedUseWhy
Sorted data + ordered iterationBST (balanced)In-order traversal, predecessor/successor queries
Disk-backed sorted indexB+ TreeMinimizes disk I/O, linked leaves for range scans
Fast min/max extractionHeapO(1) peek, O(log n) extract, array-backed
Prefix search / autocompleteTrieO(L) lookup independent of dataset size
Key-value with O(1) lookupHash TableFastest average lookup, but no ordering
Range aggregate queriesSegment / Fenwick TreeO(log n) query and update on array ranges
BST vs Hash Table
Hash tables give O(1) average lookup but cannot do range queries or ordered traversal. If you need ordering, use a balanced BST. If you only need key lookup, use a hash table.

Test Yourself

Score: 0 / 10
Question 01
What is the worst-case time complexity of search in an unbalanced BST?
An unbalanced BST can degenerate into a linked list (e.g., inserting sorted data), making search O(n).
Question 02
Which traversal of a BST produces elements in sorted order?
In-order traversal visits left subtree, then root, then right subtree — which for a BST yields keys in ascending order.
Question 03
Which data structure does the Linux CFS (Completely Fair Scheduler) use?
The Linux CFS uses a Red-Black tree keyed by virtual runtime to pick the next process to schedule in O(log n).
Question 04
In a B+ Tree, where is the actual data stored?
B+ Trees store data exclusively in leaf nodes. Internal nodes only contain keys used for routing searches to the correct leaf.
Question 05
What is the time complexity of building a heap from an unsorted array using Floyd's algorithm?
Floyd's bottom-up heapify runs in O(n). Although each sift-down is O(log n), most nodes are near the bottom and require very few swaps, making the total linear.
Question 06
What is the lookup time complexity of a trie for a string of length L?
A trie traverses one node per character of the query string, so lookup is O(L) — completely independent of the number of strings stored.
Question 07
To efficiently find the top-K largest elements from a stream of N numbers, which approach is optimal?
A min-heap of size K keeps the K largest elements seen so far. Each new element is compared to the root (smallest of the K) — if larger, replace and sift down. Total: O(N log K) time, O(K) space.
Question 08
Which tree has stricter balancing — AVL or Red-Black?
AVL trees require the heights of left and right subtrees to differ by at most 1, making them more strictly balanced than Red-Black trees (max height ~1.44 log n vs ~2 log n).
Question 09
Why do B+ Trees link their leaf nodes together?
Linked leaves let range queries (e.g., WHERE x BETWEEN 10 AND 100) scan sequentially through leaf pages instead of repeatedly traversing the tree from the root.
Question 10
A tree with 15 nodes has how many edges?
Every tree with n nodes has exactly n - 1 edges. This is a fundamental property — each node except the root has exactly one edge connecting it to its parent.