Databases

SQL Databases

● Beginner ⏱ 10 min read database

SQL databases have been the backbone of production systems for five decades. Despite the NoSQL wave, relational databases remain the default choice for most applications — and for good reason. When you need strong consistency, complex queries, and referential integrity, nothing else comes close. Understanding how they work internally lets you use them effectively and know exactly where their limits are.

What Is SQL?

SQL (Structured Query Language) is the standard language for interacting with relational databases. It covers four areas:

SQL is declarative: you describe what you want, not how to get it. The database engine figures out the most efficient execution strategy.

The Relational Model

A relational database organizes data into tables (also called relations). Each table has a fixed set of columns (attributes), and each row (tuple) represents one record. The schema enforces that every row in a table has the same columns, and each column has a defined data type.

Keys and Relationships

Joins

Joins combine rows from two or more tables based on a related column. They are the defining feature of the relational model — data is stored once (normalized) and assembled at query time:

💡
Joins Are Not Slow by Default

A common misconception is that joins are expensive. A well-indexed join between two tables returns results in milliseconds. Joins become slow when joining large tables without indexes, when the query plan chooses a nested-loop join over a hash join, or when returning millions of rows. The fix is almost always adding the right index or rewriting the query — not switching to a database that doesn’t have joins.

ACID Properties

SQL databases guarantee ACID properties for transactions. These four properties are what make relational databases trustworthy for financial, medical, and any other data where correctness is non-negotiable.

Atomicity

A transaction is all-or-nothing. If a transaction contains ten SQL statements and the ninth fails, none of the ten are committed — the database rolls back to its state before the transaction began. This prevents partial updates: a bank transfer that debits one account but never credits the other is impossible under atomicity.

Consistency

A transaction must transition the database from one valid state to another. All integrity constraints (foreign keys, unique indexes, check constraints, NOT NULL) must hold before the transaction commits. If a statement would violate a constraint, the transaction is rolled back.

Isolation

Concurrent transactions must not interfere with each other. Without isolation, you get anomalies:

SQL defines four isolation levels that trade off anomaly prevention for concurrency:

Isolation LevelDirty ReadNon-repeatable ReadPhantom Read
READ UNCOMMITTEDPossiblePossiblePossible
READ COMMITTEDPreventedPossiblePossible
REPEATABLE READPreventedPreventedPossible
SERIALIZABLEPreventedPreventedPrevented

Most databases default to READ COMMITTED (PostgreSQL) or REPEATABLE READ (MySQL/InnoDB). SERIALIZABLE is the strictest — transactions execute as if they were run one at a time — but at the cost of lower concurrency. PostgreSQL implements SERIALIZABLE through Serializable Snapshot Isolation (SSI), which detects conflicts at commit time rather than blocking.

Durability

Once a transaction commits, it is permanent. The database writes the transaction to the Write-Ahead Log (WAL) on disk before acknowledging the commit. If the server crashes immediately after the commit acknowledgment, the transaction is recovered on restart by replaying the WAL. Durability is only as strong as your storage — losing the WAL (e.g. from a storage failure with fsync disabled) can corrupt the database.

⚠️
Never Disable fsync in Production

Some guides suggest setting fsync=off in PostgreSQL or innodb_flush_log_at_trx_commit=0 in MySQL to speed up writes. This disables the durability guarantee: a system crash after a commit can leave your database in a corrupt, unrecoverable state. Only do this in disposable test environments where data loss is acceptable.

Indexes

An index is a separate data structure that the database maintains alongside the table to speed up data retrieval. Without an index, a query that filters by a non-primary-key column must scan every row in the table (a full table scan). With an index, the database jumps directly to the matching rows.

B-Tree Indexes

The default index type in virtually all relational databases. Data is stored in a balanced tree structure, sorted by the indexed column(s). B-trees support:

Composite Indexes

An index on multiple columns. A composite index on (last_name, first_name) can be used for queries filtering by last_name alone, or by both last_name and first_name, but not by first_name alone. Column order matters — the index is sorted first by the leftmost column. This is the leftmost prefix rule.

Covering Indexes

An index that contains all columns needed to satisfy a query — no need to read the actual table rows. If a query selects only email and user_id and both are in the index, the database reads the index only. This is called an index-only scan and is significantly faster than touching the table heap.

Index Trade-offs

Indexes speed up reads but slow down writes. Every INSERT, UPDATE, or DELETE that touches indexed columns must also update the index. A table with 10 indexes on it pays 10 extra write operations per row modification. Add indexes for columns that appear in WHERE, JOIN, and ORDER BY clauses on hot queries. Remove indexes that are never used.

Query Planning

When you submit a SQL query, the database does not execute it directly. The query planner (or optimizer) generates multiple possible execution plans and estimates the cost of each, then executes the cheapest one. Costs are estimated using table statistics (row counts, column cardinality, value histograms) that the database collects periodically.

Understanding query plans is the most important skill for diagnosing slow queries. In PostgreSQL, use EXPLAIN ANALYZE to see the actual plan and timing:

EXPLAIN ANALYZE
SELECT u.name, o.total
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE u.country = 'US'
  AND o.created_at > '2024-01-01';

Key things to look for in a query plan:

DatabaseStrengthsBest For
PostgreSQLFull SQL compliance, JSONB, extensions, advanced indexing (GIN, GiST, BRIN), excellent MVCCGeneral purpose, complex queries, geospatial, full-text search, JSON workloads
MySQL / MariaDBWidely deployed, InnoDB engine, good replication ecosystem, simple to operateWeb applications, read-heavy workloads, large ecosystem support
SQLiteEmbedded, zero-config, single-file, no server processMobile apps, desktop apps, testing, small embedded databases
SQL ServerDeep Windows/Azure integration, business intelligence toolingEnterprise Microsoft stacks, .NET applications
CockroachDBDistributed SQL with horizontal scaling, strong consistency, Postgres-compatible wire protocolGlobal applications needing SQL semantics with distributed scale
PlanetScaleHorizontal sharding on MySQL, online schema changes with no locking, branch-based deploymentsMySQL workloads at large scale with zero-downtime migrations

When to Use SQL

SQL databases are the right default for most systems. Reach for them when:

PostgreSQL as the Default Choice

When in doubt, start with PostgreSQL. It handles relational workloads, JSONB for semi-structured data, full-text search via tsvector, geospatial queries via PostGIS, time-series via TimescaleDB extension, and analytical queries via columnar extensions. Most systems that reach for specialized NoSQL databases could have stayed on PostgreSQL longer than they thought. Add NoSQL only when PostgreSQL genuinely cannot meet a specific requirement.