Foundations

Caching

● Intermediate ⏱ 25 min read caching

Caching is one of the highest-leverage techniques in system design. By storing the result of an expensive operation and reusing it on future requests, you can reduce latency by orders of magnitude, cut database load by 90% or more, and serve far more traffic from the same infrastructure. Almost every large-scale system — from web apps to search engines to CDNs — uses caching at multiple layers. Understanding when to cache, where to cache, and what can go wrong is essential knowledge for interviews and production systems alike.

What Is Caching?

A cache is a fast, temporary storage layer that holds a copy of data so future requests for that data can be served faster than fetching it from the original source. The original source might be a database, an external API, a disk, or a computation that takes time to run.

The fundamental trade-off in caching is speed vs freshness. The cache is faster than the source, but it may hold stale data. Every caching strategy is essentially a different answer to the question: how much staleness is acceptable, and for how long?

💡

The Memory Hierarchy

Caching is not just a software pattern — it is fundamental to how computers work. CPU registers (fastest), L1/L2/L3 caches, RAM, SSD, HDD, and network storage form a hierarchy where each level is slower but larger. Application-level caching is just this same pattern applied at a higher level of abstraction.

Cache Types

Caches exist at many layers of a system. Knowing where a cache lives determines what it can cache, who benefits from it, and what invalidates it.

Client-Side Cache (Browser Cache)

The browser stores HTTP responses locally. When you revisit a page, the browser checks whether its cached copy is still valid (via Cache-Control, ETag, or Last-Modified headers) before making a network request. This eliminates round trips entirely for static assets.

This cache is per-user and per-device — it doesn’t help with the server’s load, but it dramatically reduces the user’s perceived latency.

CDN Cache (Edge Cache)

A Content Delivery Network caches static and cacheable dynamic content at edge nodes geographically close to users. When a user in Tokyo requests an asset, they hit a CDN node in Tokyo rather than an origin server in Virginia, reducing latency from 200ms to 5ms.

CDN caches are shared across users — one user’s request populates the cache for all subsequent users in that region. This makes them extraordinarily effective for popular content.

Web Server / Reverse Proxy Cache

Nginx, Varnish, and similar reverse proxies can cache HTTP responses before they reach your application. A request for a popular page is served from the proxy’s memory without your application code ever running. This layer protects the application tier from traffic spikes.

Application Cache (In-Process)

Cache data lives in the application’s own memory (a HashMap, an LRU cache library, etc.). Lookups are extremely fast — no network hop required. The downside is that it is not shared between application instances: each server maintains its own copy, which can diverge if data changes.

Best for data that is expensive to compute, read-heavy, and where slight staleness across instances is acceptable (e.g. feature flags, configuration, infrequently changing reference data).

Distributed Cache

A dedicated caching service (Redis, Memcached) shared by all application instances. All servers read from and write to the same cache, so the data is consistent across the fleet. This is the most common caching pattern for web-scale applications.

The trade-off: you’ve introduced a network hop (typically 0.5–2ms) compared to in-process caching, and the cache service itself becomes a dependency that must be highly available.

Database Cache (Query Cache)

Many databases maintain internal caches for query results, parsed query plans, and frequently accessed data pages. PostgreSQL and MySQL buffer pools keep hot data in RAM to avoid disk reads. These caches are transparent to the application and are managed by the database engine.

Cache Type	Location	Shared?	Network hop?	Best for
Browser	Client device	No (per user)	No	Static assets, HTML
CDN	Edge node	Yes (per region)	Yes (short)	Static assets, popular content
Reverse proxy	Data center edge	Yes	No (in-datacenter)	Full HTTP responses
Application (in-process)	App server RAM	No (per instance)	No	Config, computed aggregates
Distributed (Redis)	Cache server	Yes	Yes (~1ms)	Sessions, DB results, API responses
Database	DB server RAM	Yes	Yes	Hot rows, query plans

Cache Hit and Miss

When a request arrives:

Cache hit — the data is in the cache and is still valid. Return it immediately. No trip to the source.
Cache miss — the data is not in the cache (or has expired). Fetch it from the source, store it in the cache, then return it.

The hit rate (hits / total requests) is the key cache health metric. A hit rate of 95% means 95% of requests never touch the database. Below 80%, your cache may not be worth its operational complexity.

✅

The 80/20 Rule in Caching

Most systems exhibit a power-law access pattern: roughly 80% of requests touch 20% of the data. This is why caches are so effective even when they hold only a small fraction of the total data. You don’t need to cache everything — just the hot set.

Write Strategies

When data is written (created or updated), the cache and the database can become inconsistent. Write strategies define the relationship between the two.

Write-Through

Every write goes to the cache and the database synchronously. The write completes only when both are updated. The cache is always consistent with the database.

Advantage: No stale data. Reads always hit the cache and get fresh data.
Disadvantage: Higher write latency (two writes per operation). Cache gets populated with data that may never be read.

Best for: Read-heavy workloads where data freshness is critical (user profiles, pricing data).

Write-Behind (Write-Back)

Writes go to the cache immediately and return success to the client. The cache asynchronously flushes updates to the database after a delay or when the cache entry is evicted.

Advantage: Very low write latency. Multiple writes to the same key can be coalesced into one database write (write amplification reduction).
Disadvantage: Risk of data loss if the cache crashes before flushing. Complexity in managing the async flush.

Best for: Write-heavy workloads where writes can be batched (analytics counters, activity feeds, game state).

Write-Around

Writes go directly to the database, bypassing the cache. The cache is only populated on cache misses (lazy population).

Advantage: Avoids polluting the cache with write-once data that won’t be re-read.
Disadvantage: The first read after a write always misses the cache.

Best for: Data that is written once and read rarely (logs, large file uploads, infrequent records).

Strategy	Write path	Read freshness	Write latency	Data loss risk
Write-through	Cache + DB (sync)	Always fresh	Higher	None
Write-behind	Cache now, DB later	Fresh in cache	Lowest	On cache crash
Write-around	DB only	Miss on first read	Same as DB	None

Cache-Aside (Lazy Loading)

Cache-aside is the most common read pattern and is orthogonal to the write strategies above. The application manages the cache manually:

Check the cache for the key.
If hit, return the cached value.
If miss, fetch from the database, store in the cache, return the value.

The application decides what to cache and when. The cache only ever holds data that has actually been requested, so it stays relevant. This pattern is often combined with write-around for writes.

⚠️

Cold Start

When you deploy a fresh cache (after a crash or a flush), everything is a miss. All requests fall through to the database simultaneously — this is a thundering herd. Mitigate it by warming the cache before switching traffic, or by using request coalescing so only one request fetches a given key while others wait.

Eviction Policies

Caches have finite memory. When the cache is full and a new item needs to be stored, the cache must evict an existing item. The eviction policy determines which item is removed.

LRU — Least Recently Used

Evict the item that was accessed furthest in the past. The assumption is that recently accessed items are more likely to be accessed again. LRU is the most widely used eviction policy and is the default in Redis.

Weakness: A single full scan of the dataset (a rare large query) evicts the entire working set, causing a cache miss storm. This is sometimes called a cache pollution attack.

LFU — Least Frequently Used

Evict the item that has been accessed the fewest times overall. Keeps popular items regardless of recency.

Weakness: Items that were popular in the past but are no longer relevant (e.g. yesterday’s trending article) stay in cache because of their historical hit count. Requires decay mechanisms to age out old counts.

FIFO — First In, First Out

Evict the item that was inserted first, regardless of how often it has been accessed. Simple to implement. Poor hit rate for most workloads because popularity is not considered.

TTL — Time To Live

Each cache entry has an expiry time. Items are evicted when their TTL expires. TTL is not an eviction policy in the same sense (it doesn’t resolve capacity pressure), but it is the primary mechanism for controlling staleness. In practice, caches use TTL in combination with LRU or LFU.

Choosing a TTL is a product decision: longer TTL = better hit rate, more staleness; shorter TTL = lower hit rate, fresher data.

Random Replacement

Evict a randomly chosen item. Surprisingly effective in practice because it avoids pathological cases like cache pollution. Used internally in some CPU caches.

Policy	Evicts	Hit rate	Weakness
LRU	Least recently accessed	High (typical workloads)	Scan pollution
LFU	Least frequently accessed	High (stable access patterns)	Stale popularity
FIFO	Oldest inserted	Low	Ignores popularity
TTL	Expired entries	Depends on TTL choice	Staleness vs hit rate
Random	Random item	Medium	Unpredictable

Cache Invalidation

Cache invalidation is the hardest problem in caching. When the underlying data changes, how do you ensure the cache reflects the new state?

“There are only two hard things in Computer Science: cache invalidation and naming things.” — Phil Karlton

TTL-Based Expiry

The simplest approach: set a TTL on every cache entry. After expiry, the next request fetches fresh data. Staleness is bounded by the TTL.

Appropriate when occasional staleness is acceptable. A news headline cached for 60 seconds is fine; a bank account balance cached for 60 seconds is not.

Active Invalidation (Purge on Write)

When data is written, the application explicitly deletes or updates the corresponding cache key. The next read repopulates the cache with fresh data.

More complex but ensures tight consistency. Requires careful key management — you must know exactly which cache keys correspond to the data that changed.

Event-Driven Invalidation

The database or a change-data-capture (CDC) system publishes events when data changes. Cache services subscribe to these events and invalidate affected keys. This decouples the application from cache management but introduces infrastructure complexity (Kafka, Debezium, or similar).

🚨

The Stale Read Problem

In systems with eventual consistency — replicated databases, distributed caches — you can serve reads from different nodes that have received different updates. A user updates their profile; the write goes to the primary DB; the cache serves the old value for up to TTL seconds; meanwhile the user refreshes and sees their old data. This is often fine. When it’s not, you need stronger guarantees: read-your-own-writes consistency, sticky sessions, or cache invalidation on every write.

Distributed Caching

At scale, a single cache server becomes a bottleneck and a single point of failure. Distributed caching solves this by spreading data across multiple cache nodes.

Redis

Redis is the most widely used distributed cache. It is a single-threaded, in-memory data structure store supporting strings, hashes, lists, sets, sorted sets, bitmaps, and more. Key features:

Persistence: Optional RDB snapshots and AOF (append-only file) journaling for durability.
Replication: Primary-replica replication for read scaling and HA.
Redis Sentinel: Monitors primaries and replicas, performs automatic failover.
Redis Cluster: Automatic data sharding across multiple nodes using consistent hashing (16,384 hash slots).
Pub/Sub: Message broadcasting between services.
Lua scripting: Atomic multi-step operations.

Memcached

A simpler, multithreaded in-memory key-value store. Supports only string values (no rich data types). No persistence, no replication built-in. Clients handle sharding by hashing keys to nodes.

Memcached can be faster than Redis for simple get/set workloads on multi-core machines because of its multithreaded architecture. Redis is almost always preferred for new systems due to its richer feature set and built-in clustering.

Sharding a Distributed Cache

When the working set exceeds one cache server’s RAM, data must be sharded across nodes. The challenge: when you add or remove a node, which keys move? Naive modulo hashing (key % N) would reassign almost all keys on every topology change, causing a cache miss storm.

Consistent hashing solves this: keys and nodes are placed on a ring. Adding or removing a node only redistributes the keys adjacent to it — typically 1/N of total keys. Redis Cluster uses a variant (hash slots) for the same effect.

💡

Redis vs Memcached

Choose Redis for: sessions, leaderboards, rate limiting, pub/sub, geospatial queries, or anything that needs data types beyond strings. Choose Memcached for: extremely simple string caching at very high throughput on multi-core hardware. In practice, Redis is the default for almost all new systems.

Caching Problems

Caching introduces failure modes that don’t exist in uncached systems. These three are classic interview topics.

Cache Stampede (Thundering Herd)

A popular cache entry expires. Simultaneously, hundreds of requests for that key arrive, all find a miss, and all go to the database at once. The database is suddenly hit with 100× its normal load for that key.

Solutions:

Lock / mutex: The first request to miss acquires a lock and fetches from DB. Other requests wait for the lock, then read the newly populated cache entry.
Probabilistic early expiry: Slightly before TTL expires, start re-fetching in the background while still serving the cached value. No thundering herd because the key never actually expires.
Jitter: Add random variation to TTLs so many entries don’t expire simultaneously.

Cache Penetration

Requests for keys that don’t exist in the database. Every request misses the cache (there’s nothing to cache) and hits the database directly. An attacker can deliberately query non-existent keys to bypass the cache and overwhelm the database.

Solutions:

Cache null values: If the DB returns no result, cache a sentinel value (e.g. null or an empty object) for a short TTL. Subsequent requests for that key hit the cache.
Bloom filter: A probabilistic data structure that can definitively say a key does not exist. Check the bloom filter before the cache. Non-existent keys are rejected without hitting the DB or cache.

Cache Avalanche

A large number of cache entries expire at the same time (or the cache server crashes), causing a flood of DB requests all at once. Unlike cache stampede (one key), an avalanche affects many keys simultaneously.

Solutions:

TTL jitter: Randomize TTLs (e.g. base_ttl ± 10%) so entries don’t expire in a synchronized wave.
Circuit breaker: If the DB is overwhelmed, stop accepting new requests rather than amplifying the problem.
Cache HA: Run Redis in cluster or sentinel mode so a single node failure doesn’t take down the entire cache.
Warm the cache on startup: Pre-populate cache before switching traffic to a fresh deployment.

Problem	Cause	Primary mitigation
Cache stampede	One popular key expires; many concurrent misses	Mutex lock, probabilistic early expiry, TTL jitter
Cache penetration	Requests for keys that never exist in DB	Cache null values, bloom filter
Cache avalanche	Many keys expire at once or cache crashes	TTL jitter, circuit breaker, cache HA

💡

In System Design Interviews

Mention caching early whenever you need to reduce DB load or latency. State the cache type (distributed, in-process, CDN), the write strategy (cache-aside is usually correct), the TTL rationale, and the eviction policy. Interviewers love to ask follow-ups about cache invalidation and what happens when the cache is down — have answers for cache stampede, penetration, and avalanche ready.