Indexes
An index is a separate data structure that the database maintains alongside a table, enabling it to find rows that match a query condition without scanning every row. Indexes are the single most effective tuning lever available to a database engineer — a missing index on a 100-million-row table turns a millisecond lookup into a seconds-long full scan. But indexes cost write performance and storage, and adding the wrong index can make things worse. Understanding how indexes work internally is what separates informed index design from guesswork.
What Is an Index?
Without an index, finding all users with email = 'alice@example.com' requires reading every row in the table — a sequential scan (also called a full table scan). For a table with 50 million rows this is catastrophically slow.
An index is a sorted copy of one or more columns, stored in a structure designed for fast lookup. The index stores the indexed column value alongside a pointer (usually the primary key or a physical row location) to the full row. A lookup on an indexed column walks the index structure to find matching values, then follows the pointers to retrieve the full rows — touching a tiny fraction of the table.
Indexes come at a cost:
- Storage: Each index is a separate data structure that occupies disk and memory.
- Write overhead: Every INSERT, UPDATE, and DELETE must update all indexes on the affected columns. A table with 10 indexes takes 10x more write work than a table with no indexes.
- Query planner complexity: The optimizer must consider all available indexes for each query. Too many indexes slow planning and may confuse the optimizer into choosing a worse plan.
B-Tree Indexes
The B-tree (Balanced Tree) is the default index type in PostgreSQL, MySQL, SQL Server, and virtually every relational database. Understanding B-trees explains why indexes are fast for some queries and useless for others.
Structure
A B-tree is a self-balancing tree where all leaf nodes are at the same depth. Each node holds multiple keys (column values) and pointers to child nodes or row locations. The tree is always sorted — every key in the left subtree of a node is less than the node’s key; every key in the right subtree is greater.
How a Lookup Works
- The query planner starts at the root node.
- It compares the search key against the node’s keys and follows the appropriate child pointer.
- This continues level by level until a leaf node is reached.
- The leaf node contains the matching key(s) and row pointer(s).
- The database fetches the full row(s) from the table using the pointers.
A B-tree with height h requires at most h node reads to find a value. For a billion-row table, a typical B-tree has height 3–4, meaning 3–4 I/Os to find any row. This is the magic of logarithmic lookup.
What B-Trees Support Well
- Equality:
WHERE email = 'alice@example.com' - Range queries:
WHERE created_at >= '2024-01-01' AND created_at < '2024-02-01' - Prefix matching:
WHERE username LIKE 'alice%'(but notLIKE '%alice') - Sorting:
ORDER BY emailcan use the index without a separate sort step - Min/max:
SELECT MIN(price)reads the leftmost or rightmost leaf — O(log n)
Wrapping an indexed column in a function defeats the index. WHERE LOWER(email) = 'alice@example.com' cannot use an index on email because the index stores the original values, not the lowercased ones. The fix: store data normalized (e.g., always lowercase emails on insert), or create an expression index on LOWER(email).
Hash Indexes
Hash indexes use a hash function to map a column value to a bucket. Lookups are O(1) for exact equality. They are useless for range queries — hashing destroys sort order, so there is no way to scan a range of values.
PostgreSQL supports explicit hash indexes (CREATE INDEX ... USING hash). MySQL’s InnoDB engine uses a self-tuning adaptive hash index in memory for frequently accessed B-tree pages. In practice, B-tree indexes satisfy most workloads — use hash indexes only when you have proven equality-only access patterns and need the O(1) benefit.
Composite Indexes
A composite index (also called a multi-column index) covers two or more columns. The index entries are sorted first by the first column, then by the second column within equal first-column values, and so on.
CREATE INDEX idx_orders_user_created ON orders (user_id, created_at);
The Leftmost Prefix Rule
A composite index on (a, b, c) can be used by queries that filter on:
aalonea, ba, b, c
It cannot be used by queries that filter on b alone, or c alone, or b, c. The index can only be entered from the leftmost column.
This is why column order in a composite index matters enormously. Put the most-selective equality column first. Put range-predicate columns last (a range scan on column 2 stops the index from being used for column 3).
For a query WHERE status = 'active' AND created_at > '2024-01-01', put status first (equality predicate), then created_at (range predicate). The index (status, created_at) narrows to all active rows using the equality on status, then scans the range on created_at within that subset. Reversing the order wastes the index on status because the range on created_at prevents the optimizer from skipping to specific status values.
Covering Indexes
A covering index includes all columns that a query needs — the SELECT columns, WHERE columns, and ORDER BY columns. When a query is covered by an index, the database performs an index-only scan: it reads the index and returns results without ever touching the main table rows.
-- Query
SELECT user_id, amount FROM orders WHERE status = 'pending';
-- Covering index: includes all columns referenced in the query
CREATE INDEX idx_orders_status_covering ON orders (status, user_id, amount);
Index-only scans can be dramatically faster than regular index scans because:
- The index is typically smaller than the table and fits more readily in memory.
- No random I/O to fetch individual rows from the table heap.
- For analytical queries that aggregate a few columns across many rows, this is transformative.
The trade-off: wider indexes consume more storage and add more write overhead since every covered column must be maintained in the index.
Index Selectivity
Selectivity is the ratio of distinct values to total rows. High selectivity (many distinct values, like email or UUID) means an index lookup returns very few rows. Low selectivity (few distinct values, like a boolean is_active flag) means a lookup returns many rows — possibly most of the table.
The database query planner estimates selectivity using column statistics (the pg_statistic catalog in PostgreSQL, updated by ANALYZE). It uses this estimate to decide whether to use an index or a sequential scan.
If 80% of orders have status = 'pending', an index on status is nearly useless for that value — a sequential scan is cheaper (sequential I/O is faster than random I/O for large fractions of the table). The same index is very useful when filtering for status = 'refunded' if only 0.1% of orders have that status.
An index is most beneficial when it selects fewer than ~10–20% of table rows. For highly skewed data, partial indexes (see below) can solve the problem by indexing only the selective subset.
Partial and Expression Indexes
Partial Indexes
A partial index covers only a subset of rows defined by a WHERE clause in the index definition.
-- Index only unprocessed orders — the selective, hot subset
CREATE INDEX idx_orders_pending ON orders (created_at) WHERE status = 'pending';
Benefits:
- The index is smaller — fewer rows to store, more of it fits in memory.
- The index is always highly selective by construction (only the filtered rows are present).
- Write overhead applies only to rows that match the partial index predicate.
Partial indexes are ideal for: soft-deleted records (WHERE deleted_at IS NULL), queue-like workloads (WHERE status = 'pending'), sparse flags (WHERE is_premium = true).
Expression Indexes
An expression index stores the result of a function or expression rather than the raw column value.
-- Enables case-insensitive email lookup
CREATE INDEX idx_users_email_lower ON users (LOWER(email));
-- Query that uses it
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';
Expression indexes solve the "functions break indexes" problem. They are computed once at index build time and maintained on every write, so reads are fast at the cost of additional write-time computation.
Full-Text Indexes
B-tree and hash indexes operate on exact or prefix values. Full-text search requires matching any word within a text column — a fundamentally different problem.
Inverted indexes are the standard data structure for full-text search. An inverted index maps each word (token) to the list of rows that contain it, along with positional information. To find all rows containing "distributed systems", the database intersects the posting lists for both words.
In PostgreSQL, full-text indexes use the GIN (Generalized Inverted Index) type:
CREATE INDEX idx_articles_body_fts ON articles USING GIN (to_tsvector('english', body));
Dedicated search engines like Elasticsearch and OpenSearch build on inverted indexes but add ranking (TF-IDF, BM25), stemming, synonyms, and distributed sharding — capabilities that outstrip what a relational database’s built-in FTS can offer at scale.
Write Overhead
Every index adds work to every write operation on the table:
| Operation | Index cost |
|---|---|
| INSERT | Insert entry into every index on the table |
| UPDATE (indexed column) | Delete old entry and insert new entry in every affected index |
| UPDATE (non-indexed column) | No index cost (index entry unchanged) |
| DELETE | Delete entry from every index on the table |
A table with 15 indexes processes a bulk INSERT 15 times slower (from an index maintenance perspective) than a table with no indexes. For write-heavy workloads — event logs, IoT telemetry, append-only tables — aggressive indexing can cripple write throughput. Drop unused indexes before they become a write bottleneck.
Index bloat is also a concern: B-trees do not immediately reclaim space when rows are deleted. Dead index entries accumulate over time, slowing scans and consuming storage. PostgreSQL’s VACUUM reclaims dead tuples; periodic REINDEX or REINDEX CONCURRENTLY rebuilds bloated indexes.
When to Index (and When Not To)
Strong candidates for indexing
- Primary keys and foreign keys. Primary keys are always indexed. Foreign keys must be indexed to avoid full table scans during JOIN and DELETE operations.
- Columns in WHERE clauses of frequent queries. The most impactful indexes are on columns that filter rows in hot query paths.
- Columns in ORDER BY / GROUP BY on large result sets.
- Columns in JOIN conditions (the joining side, not the joined side if it’s the primary key).
- Unique constraints. Databases implement UNIQUE constraints via a unique index.
Avoid or reconsider indexing when
- The table is small (a few thousand rows). Sequential scans on small tables are faster than index lookups due to lower overhead.
- The column has low cardinality (boolean, enum with few values) without an accompanying high-selectivity filter.
- The table is write-heavy with no read-critical query path that benefits from the index.
- The index is redundant. An index on
(a)is redundant if you already have a composite index on(a, b)— the composite index satisfies queries onaalone via the leftmost prefix rule. - The column is never queried selectively. Indexing a timestamp column on a table where every query is a full table dump adds write overhead with no read benefit.
Reading EXPLAIN Output
The database query planner’s decision-making is visible via EXPLAIN. In PostgreSQL:
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE user_id = 42;
Key plan nodes to recognize:
- Seq Scan: Full sequential table scan. Expected for small tables or low-selectivity queries; a red flag on large tables with indexed columns.
- Index Scan: Walks the B-tree then fetches rows from the table heap. Good for selective queries returning a small fraction of rows.
- Index Only Scan: Reads the index alone, never touches the heap. Best case — indicates a covering index.
- Bitmap Index Scan: Builds a bitmap of matching row locations, then fetches heap pages in order. Used when an index returns many rows — more efficient than random heap access.
-- Look for high actual rows vs estimated rows — a sign of stale statistics
-- Run ANALYZE to refresh
ANALYZE orders;
When EXPLAIN ANALYZE shows a Seq Scan on a large table for a query that should use an index, the likely causes are: no index exists, the index doesn’t match the query shape (function on column, wrong column order in composite), or statistics are stale and the planner overestimates selectivity.
Design Considerations
- Start with query patterns, not columns. Index for the queries you actually run, not speculatively for every column. Profile the slowest queries first.
- Use composite indexes rather than many single-column indexes. A composite index satisfies more queries and has less total overhead than multiple separate indexes that the planner must combine.
- Monitor index usage. In PostgreSQL,
pg_stat_user_indexesshows how many times each index has been scanned. Indexes with zero scans over weeks of production traffic are candidates for removal. - Build indexes concurrently in production.
CREATE INDEX CONCURRENTLYbuilds the index without locking writes, at the cost of taking longer. Never build a large index with a regularCREATE INDEXin production — it takes an exclusive lock that blocks all writes. - Foreign key indexes are often forgotten. Adding a foreign key constraint without a corresponding index causes every DELETE on the parent table to do a full scan of the child table. This is a common, painful production surprise.
When discussing database performance at scale, mention indexes explicitly. “We’ll index user_id and created_at on the orders table. A composite index (user_id, created_at) serves both the per-user listing query and the date-range filter without needing two separate indexes. We’ll use a partial index on status = ‘pending’ to keep the queue-scan fast as the table grows.” This shows you think about access patterns, not just columns.