Practical system design guides with visual diagrams. Built for engineers who want to understand distributed systems deeply, not just memorize interview patterns.
IP addresses, OSI model layers, TCP vs UDP, and DNS resolution — the networking foundation every system designer needs.
Load balancing algorithms, L4 vs L7 balancers, health checks, sticky sessions, and when to use each strategy in production systems.
How server clusters work, active-passive vs active-active modes, leader election, quorum, and coordination services like ZooKeeper and etcd.
Cache types, eviction policies, write strategies, cache invalidation, and distributed caching with Redis and Memcached.
How CDNs work, push vs pull models, caching at the edge, and when to use a CDN in system design.
How forward and reverse proxies work, their differences, and how to apply them for security, load balancing, and caching.
How to measure availability with nines, eliminate single points of failure, design for redundancy and failover, and reason about the consistency trade-off.
Vertical vs horizontal scaling, stateless design, database scaling strategies, and how to design systems that handle 10x traffic without rewriting everything.
Block, file, and object storage explained. RAID levels, NAS vs SAN, HDFS, and how to choose the right storage type for your system design.
What databases and database management systems actually do, the components under the hood, common database types, and the key challenges you face designing data layers at scale.
How relational databases work: ACID guarantees, schema design, indexes, query planning, and when SQL is the right choice for your system.
Document, key-value, wide-column, and graph databases explained. When NoSQL outperforms SQL, what you give up, and how to pick the right type for your workload.
A direct comparison across consistency, scalability, schema flexibility, and query power. How to choose the right model for your system.
Primary-replica topology, synchronous vs asynchronous replication, replication lag, multi-primary conflicts, and automated failover strategies.
B-tree and hash indexes, composite index column ordering, covering indexes, partial indexes, write overhead, and how to read EXPLAIN output to diagnose slow queries.
Normal forms 1NF through BCNF, how redundancy causes update anomalies, and when to deliberately denormalize with materialized views and embedded documents for read performance.
ACID transaction guarantees — atomicity, isolation levels, durability via WAL — and the BASE model of eventual consistency for distributed systems.
Consistency, Availability, and Partition Tolerance: why distributed systems can guarantee only two, CP vs AP system trade-offs, and real-world implications.
How PACELC extends CAP by adding the latency vs consistency trade-off that governs every request during normal operation — PA/EL, PC/EC, and tunable systems.
What database transactions are, how ACID properties are enforced, isolation levels, concurrency anomalies, and how WAL and MVCC make atomicity and durability possible.
Two-phase commit, the Saga pattern, choreography vs orchestration, and the outbox pattern — how to achieve atomicity across multiple services without a global lock.
How to partition a database horizontally across multiple servers: choosing a shard key, range vs hash sharding, avoiding hot spots, cross-shard queries, and resharding.
The hash ring, virtual nodes, and why consistent hashing remaps only 1/N keys when a node is added — used in Cassandra, DynamoDB, Memcached, and distributed load balancers.
How to split a monolithic database by function — independent schemas, autonomous teams, and the trade-offs around cross-database joins and distributed transactions.
Common architectural patterns for structuring distributed systems — from foundational N-tier layering to event-driven and microservice designs.
How separating an application into presentation, business logic, and data tiers enables independent scaling, maintainability, and security at each layer boundary.
How asynchronous messaging with queues and pub-sub patterns decouples services, absorbs traffic spikes, and tolerates downstream failures without blocking the caller.
The real trade-offs between monolithic and microservices architectures, when to split a monolith, the Strangler Fig pattern, and the pitfalls of premature decomposition.
Using events as the primary communication mechanism between services, and event sourcing as a persistence model — projections, rebuilds, and the trade-offs of eventual consistency.
Separating read and write models so each can be optimized independently — from simple code-path separation to separate data stores and event-sourced projections.
How an API gateway acts as the single entry point — handling auth, rate limiting, routing, protocol translation, and request aggregation across a microservices backend.
The three dominant API styles compared — when REST’s simplicity wins, when GraphQL’s flexibility pays off, and when gRPC’s performance and streaming matter.
How to push updates from server to client in real time — comparing long polling, Server-Sent Events, and WebSockets by latency, complexity, and infrastructure requirements.
Reliability, observability, and operational patterns that keep distributed systems running in production — logging, monitoring, rate limiting, and more.
How location-based services index geographic data for fast proximity queries — geohash cells, neighbor expansion, and adaptive quadtree subdivision for uneven data density.
How the circuit breaker pattern prevents cascading failures — three states (closed, open, half-open), fallback strategies, and threshold tuning for production systems.
Token bucket, leaky bucket, fixed window, and sliding window algorithms — protecting APIs from abuse and implementing distributed rate limiting with Redis.
How services find each other in dynamic environments — client-side vs server-side discovery, service registries (Consul, etcd), and Kubernetes-native discovery.
The vocabulary of reliability engineering — service level indicators, objectives, and agreements; error budgets; availability math; and how they drive engineering decisions.
RTO and RPO targets, backup strategies, DR tiers from backup-restore to active-active, failover patterns, data replication, and chaos engineering.
How hypervisors and container runtimes work — namespaces, cgroups, Docker internals, image layers, and when to choose VMs over containers.
Authorization flows, access and refresh tokens, PKCE, and how OpenID Connect adds user identity on top of OAuth — the foundation of modern web auth.
SAML and OIDC-based SSO, session management across services, enterprise federation, and the trade-offs of centralized authentication.
How the TLS handshake works, certificate chains, forward secrecy, mutual TLS for service-to-service auth, and TLS termination patterns.
End-to-end system design walkthroughs — designing systems like URL shorteners, social feeds, ride-sharing platforms, and video streaming services from scratch.
End-to-end design of a URL shortener: capacity estimation, encoding strategies, redirect architecture, analytics, and scaling to billions of URLs.
WebSocket gateways, message delivery guarantees, end-to-end encryption with the Signal Protocol, group chat fan-out, and media storage at 100 billion messages per day.
The fan-out problem for home timelines, hybrid push-pull for celebrity accounts, real-time search with Earlybird, and trending topics at 500 million tweets per day.
Video transcoding pipeline, Open Connect private CDN, adaptive bitrate streaming, personalized recommendations, and microservices architecture for 200 million subscribers.
Real-time driver location tracking, geospatial matching with H3 cells, hybrid dispatch, surge pricing over geographic regions, and city-level partitioning at scale.