SQL vs NoSQL
The SQL vs NoSQL question comes up in almost every system design discussion. The honest answer is not “SQL is better” or “NoSQL is better” — it depends on the workload. But there is a clear framework for making the decision, and understanding exactly what you trade away in each direction prevents a lot of architectural regret.
Side-by-Side Comparison
| Dimension | SQL (Relational) | NoSQL |
|---|---|---|
| Data model | Tables, rows, columns with a fixed schema | Documents, key-value, wide-column, or graph — schema flexible |
| Query language | SQL — declarative, expressive, standardized | API-specific or limited query DSL |
| Joins | Yes — multi-table joins with optimizer support | Generally no joins; embed or denormalize |
| Consistency | ACID transactions by default | Often eventual; tunable in some systems |
| Horizontal scaling | Possible but complex (sharding, distributed SQL) | Built-in; designed for it from the start |
| Schema changes | Migrations required; can lock tables at scale | No migrations; documents can evolve freely |
| Write throughput | High on a single node; limited by primary capacity | Very high; writes distributed across nodes |
| Read flexibility | Ad-hoc queries over any column | Queries optimized for designed access patterns |
| Maturity | Decades of tooling, ORMs, drivers, monitoring | Varies widely by database |
| Operational complexity | Well-understood; managed services abundant | Varies; some (Cassandra) are notoriously complex |
Schema and Structure
SQL requires you to define your schema before writing data. Every row in a table has the same columns. Changing the schema requires an ALTER TABLE migration, which on a large table can lock reads and writes for minutes or hours unless you use online DDL tools (pt-online-schema-change, gh-ost) or Postgres 12+’s non-blocking ALTER TABLE ... ADD COLUMN.
NoSQL databases are schema-flexible. A document collection can contain documents with different fields. This flexibility is genuinely useful early in a product’s life when requirements shift weekly. It becomes a liability at maturity when you have no idea what shape documents in a collection actually have, and application code is full of if field exists guards.
Schemaless databases move schema enforcement from the database layer to the application layer. Now every service that reads from the collection must handle every possible document shape. Inconsistent writes — from bugs, partial migrations, or old application versions — create silent data quality problems. Many teams add application-level schema validation (e.g., Mongoose validators for MongoDB, or JSON Schema validation) and end up with something more complex than SQL migrations.
Scalability
This is where NoSQL has its strongest case. Most NoSQL databases were designed specifically for horizontal scale from day one:
- Cassandra adds nodes and rebalances automatically. Write throughput scales linearly. There is no primary node that becomes a bottleneck.
- DynamoDB handles any scale transparently — you provision capacity or use on-demand mode and the service scales behind the scenes.
- MongoDB supports sharding with a router layer (mongos) that distributes data across shards by a shard key.
SQL databases scale vertically by default. A single primary handles all writes. When the primary hits its capacity ceiling:
- Add read replicas to distribute read load (often sufficient for 90% of workloads).
- Use a distributed SQL database (CockroachDB, Spanner, PlanetScale) for horizontal write scaling with SQL semantics.
- Shard manually — partition data by a key (user ID, tenant ID) across multiple database servers. Complex but effective.
Most applications never need to scale beyond a single well-tuned SQL primary with a few read replicas. The scale threshold where NoSQL’s horizontal-first design pays off is much higher than engineers typically assume.
Consistency
This is the most consequential trade-off.
SQL gives you ACID transactions. You can transfer money between accounts atomically, decrement inventory and create an order atomically, update multiple tables atomically. All-or-nothing. If the system crashes in the middle, the database rolls back automatically.
Most NoSQL systems are eventually consistent. A write on one node propagates to others asynchronously. Between the write and the propagation, reads may return the old value. This is fine for:
- Social media like counts (off by a few for a second)
- Product view counters
- Activity feeds where slight staleness is imperceptible
It is not fine for:
- Account balances (a user’s balance reads as $100 on one replica and $0 on another after a withdrawal)
- Inventory (overselling because two nodes both read stock as 1 and both allow a purchase)
- Authentication (a revoked token still valid on a stale replica)
Some NoSQL systems bridge this gap. DynamoDB offers strongly consistent reads (at extra cost and latency). MongoDB offers linearizable reads with the right read/write concern. Cassandra can be configured for quorum reads and writes that approach strong consistency. But if you need ACID across multiple entities, SQL is still the cleaner choice.
Query Power
SQL’s declarative query language is extraordinarily powerful. You can:
- Join any table to any other table on any condition
- Aggregate across millions of rows with GROUP BY, HAVING, window functions
- Write subqueries, CTEs (Common Table Expressions), recursive queries
- Answer questions you didn’t anticipate when you designed the schema
NoSQL query capabilities are limited to what the database specifically supports. MongoDB’s aggregation pipeline is powerful but nothing like SQL’s full expressiveness. Cassandra only supports queries by partition key and clustering key — no secondary indexes at scale, no ad-hoc queries. Key-value stores support only GET by key. If your query requirements evolve beyond your NoSQL schema’s design, you are in trouble.
When to Choose SQL
- Your data has clear relationships between entities that you need to query across.
- You need multi-entity transactions: transfers, order placement with inventory decrement, user creation with role assignment.
- Your query patterns are unpredictable or will evolve. SQL handles unanticipated queries; NoSQL does not.
- You need strong data integrity: foreign key constraints, unique constraints, not-null enforcement at the database layer.
- You are early in development and your schema is still being figured out — SQL migrations with a tool like Flyway or Alembic are manageable.
- Your write volume fits on a single well-specced server (which is most applications up to significant scale).
When to Choose NoSQL
- You need extreme write throughput beyond what a single SQL primary can sustain, and distributed SQL (CockroachDB) is too expensive or operationally complex.
- Your data has truly variable structure that would require hundreds of nullable columns or complex EAV (Entity-Attribute-Value) tables in SQL.
- You have a clear, narrow, and stable set of access patterns that a NoSQL schema can be purpose-built for.
- You are building a cache layer, session store, or real-time leaderboard where Redis is the obvious tool.
- You are storing time-series sensor data at IoT scale where Cassandra’s write path is designed exactly for this.
- Your domain is deeply relational graph traversal (social connections, fraud rings) that SQL joins handle poorly.
Using Both Together
The most scalable production systems typically use SQL as the primary data store and specific NoSQL databases layered on top for targeted performance requirements:
- PostgreSQL for users, orders, payments, inventory — all transactional data.
- Redis in front of PostgreSQL for session storage and hot-read caching (product listings, user profiles that are read thousands of times per second).
- Elasticsearch as a read replica for search, fed from PostgreSQL via a change data capture pipeline.
- Cassandra for time-series event logs (clicks, page views, IoT sensor readings) that arrive at millions per second and never need cross-entity transactions.
When asked SQL vs NoSQL, demonstrate nuance: explain the specific trade-offs, pick one based on the system’s stated requirements, and acknowledge what you give up. A strong answer for a social feed system might be: “I’d use PostgreSQL for user accounts and relationships where strong consistency matters, and Cassandra for the activity feed table because feed writes are extremely high-volume and the read pattern (get feed for user X, sorted by time) maps perfectly to Cassandra’s partition + clustering key model.” This is far more credible than “I’d use MongoDB because it’s flexible.”