Foundations

Caching

● Intermediate ⏱ 25 min read caching

Caching is one of the highest-leverage techniques in system design. By storing the result of an expensive operation and reusing it on future requests, you can reduce latency by orders of magnitude, cut database load by 90% or more, and serve far more traffic from the same infrastructure. Almost every large-scale system — from web apps to search engines to CDNs — uses caching at multiple layers. Understanding when to cache, where to cache, and what can go wrong is essential knowledge for interviews and production systems alike.

What Is Caching?

A cache is a fast, temporary storage layer that holds a copy of data so future requests for that data can be served faster than fetching it from the original source. The original source might be a database, an external API, a disk, or a computation that takes time to run.

The fundamental trade-off in caching is speed vs freshness. The cache is faster than the source, but it may hold stale data. Every caching strategy is essentially a different answer to the question: how much staleness is acceptable, and for how long?

💡
The Memory Hierarchy

Caching is not just a software pattern — it is fundamental to how computers work. CPU registers (fastest), L1/L2/L3 caches, RAM, SSD, HDD, and network storage form a hierarchy where each level is slower but larger. Application-level caching is just this same pattern applied at a higher level of abstraction.

Cache Types

Caches exist at many layers of a system. Knowing where a cache lives determines what it can cache, who benefits from it, and what invalidates it.

Client-Side Cache (Browser Cache)

The browser stores HTTP responses locally. When you revisit a page, the browser checks whether its cached copy is still valid (via Cache-Control, ETag, or Last-Modified headers) before making a network request. This eliminates round trips entirely for static assets.

This cache is per-user and per-device — it doesn’t help with the server’s load, but it dramatically reduces the user’s perceived latency.

CDN Cache (Edge Cache)

A Content Delivery Network caches static and cacheable dynamic content at edge nodes geographically close to users. When a user in Tokyo requests an asset, they hit a CDN node in Tokyo rather than an origin server in Virginia, reducing latency from 200ms to 5ms.

CDN caches are shared across users — one user’s request populates the cache for all subsequent users in that region. This makes them extraordinarily effective for popular content.

Web Server / Reverse Proxy Cache

Nginx, Varnish, and similar reverse proxies can cache HTTP responses before they reach your application. A request for a popular page is served from the proxy’s memory without your application code ever running. This layer protects the application tier from traffic spikes.

Application Cache (In-Process)

Cache data lives in the application’s own memory (a HashMap, an LRU cache library, etc.). Lookups are extremely fast — no network hop required. The downside is that it is not shared between application instances: each server maintains its own copy, which can diverge if data changes.

Best for data that is expensive to compute, read-heavy, and where slight staleness across instances is acceptable (e.g. feature flags, configuration, infrequently changing reference data).

Distributed Cache

A dedicated caching service (Redis, Memcached) shared by all application instances. All servers read from and write to the same cache, so the data is consistent across the fleet. This is the most common caching pattern for web-scale applications.

The trade-off: you’ve introduced a network hop (typically 0.5–2ms) compared to in-process caching, and the cache service itself becomes a dependency that must be highly available.

Database Cache (Query Cache)

Many databases maintain internal caches for query results, parsed query plans, and frequently accessed data pages. PostgreSQL and MySQL buffer pools keep hot data in RAM to avoid disk reads. These caches are transparent to the application and are managed by the database engine.

Cache TypeLocationShared?Network hop?Best for
BrowserClient deviceNo (per user)NoStatic assets, HTML
CDNEdge nodeYes (per region)Yes (short)Static assets, popular content
Reverse proxyData center edgeYesNo (in-datacenter)Full HTTP responses
Application (in-process)App server RAMNo (per instance)NoConfig, computed aggregates
Distributed (Redis)Cache serverYesYes (~1ms)Sessions, DB results, API responses
DatabaseDB server RAMYesYesHot rows, query plans

Cache Hit and Miss

When a request arrives:

The hit rate (hits / total requests) is the key cache health metric. A hit rate of 95% means 95% of requests never touch the database. Below 80%, your cache may not be worth its operational complexity.

The 80/20 Rule in Caching

Most systems exhibit a power-law access pattern: roughly 80% of requests touch 20% of the data. This is why caches are so effective even when they hold only a small fraction of the total data. You don’t need to cache everything — just the hot set.

Write Strategies

When data is written (created or updated), the cache and the database can become inconsistent. Write strategies define the relationship between the two.

Write-Through

Every write goes to the cache and the database synchronously. The write completes only when both are updated. The cache is always consistent with the database.

Best for: Read-heavy workloads where data freshness is critical (user profiles, pricing data).

Write-Behind (Write-Back)

Writes go to the cache immediately and return success to the client. The cache asynchronously flushes updates to the database after a delay or when the cache entry is evicted.

Best for: Write-heavy workloads where writes can be batched (analytics counters, activity feeds, game state).

Write-Around

Writes go directly to the database, bypassing the cache. The cache is only populated on cache misses (lazy population).

Best for: Data that is written once and read rarely (logs, large file uploads, infrequent records).

StrategyWrite pathRead freshnessWrite latencyData loss risk
Write-throughCache + DB (sync)Always freshHigherNone
Write-behindCache now, DB laterFresh in cacheLowestOn cache crash
Write-aroundDB onlyMiss on first readSame as DBNone

Cache-Aside (Lazy Loading)

Cache-aside is the most common read pattern and is orthogonal to the write strategies above. The application manages the cache manually:

  1. Check the cache for the key.
  2. If hit, return the cached value.
  3. If miss, fetch from the database, store in the cache, return the value.

The application decides what to cache and when. The cache only ever holds data that has actually been requested, so it stays relevant. This pattern is often combined with write-around for writes.

⚠️
Cold Start

When you deploy a fresh cache (after a crash or a flush), everything is a miss. All requests fall through to the database simultaneously — this is a thundering herd. Mitigate it by warming the cache before switching traffic, or by using request coalescing so only one request fetches a given key while others wait.

Eviction Policies

Caches have finite memory. When the cache is full and a new item needs to be stored, the cache must evict an existing item. The eviction policy determines which item is removed.

LRU — Least Recently Used

Evict the item that was accessed furthest in the past. The assumption is that recently accessed items are more likely to be accessed again. LRU is the most widely used eviction policy and is the default in Redis.

Weakness: A single full scan of the dataset (a rare large query) evicts the entire working set, causing a cache miss storm. This is sometimes called a cache pollution attack.

LFU — Least Frequently Used

Evict the item that has been accessed the fewest times overall. Keeps popular items regardless of recency.

Weakness: Items that were popular in the past but are no longer relevant (e.g. yesterday’s trending article) stay in cache because of their historical hit count. Requires decay mechanisms to age out old counts.

FIFO — First In, First Out

Evict the item that was inserted first, regardless of how often it has been accessed. Simple to implement. Poor hit rate for most workloads because popularity is not considered.

TTL — Time To Live

Each cache entry has an expiry time. Items are evicted when their TTL expires. TTL is not an eviction policy in the same sense (it doesn’t resolve capacity pressure), but it is the primary mechanism for controlling staleness. In practice, caches use TTL in combination with LRU or LFU.

Choosing a TTL is a product decision: longer TTL = better hit rate, more staleness; shorter TTL = lower hit rate, fresher data.

Random Replacement

Evict a randomly chosen item. Surprisingly effective in practice because it avoids pathological cases like cache pollution. Used internally in some CPU caches.

PolicyEvictsHit rateWeakness
LRULeast recently accessedHigh (typical workloads)Scan pollution
LFULeast frequently accessedHigh (stable access patterns)Stale popularity
FIFOOldest insertedLowIgnores popularity
TTLExpired entriesDepends on TTL choiceStaleness vs hit rate
RandomRandom itemMediumUnpredictable

Cache Invalidation

Cache invalidation is the hardest problem in caching. When the underlying data changes, how do you ensure the cache reflects the new state?

“There are only two hard things in Computer Science: cache invalidation and naming things.” — Phil Karlton

TTL-Based Expiry

The simplest approach: set a TTL on every cache entry. After expiry, the next request fetches fresh data. Staleness is bounded by the TTL.

Appropriate when occasional staleness is acceptable. A news headline cached for 60 seconds is fine; a bank account balance cached for 60 seconds is not.

Active Invalidation (Purge on Write)

When data is written, the application explicitly deletes or updates the corresponding cache key. The next read repopulates the cache with fresh data.

More complex but ensures tight consistency. Requires careful key management — you must know exactly which cache keys correspond to the data that changed.

Event-Driven Invalidation

The database or a change-data-capture (CDC) system publishes events when data changes. Cache services subscribe to these events and invalidate affected keys. This decouples the application from cache management but introduces infrastructure complexity (Kafka, Debezium, or similar).

🚨
The Stale Read Problem

In systems with eventual consistency — replicated databases, distributed caches — you can serve reads from different nodes that have received different updates. A user updates their profile; the write goes to the primary DB; the cache serves the old value for up to TTL seconds; meanwhile the user refreshes and sees their old data. This is often fine. When it’s not, you need stronger guarantees: read-your-own-writes consistency, sticky sessions, or cache invalidation on every write.

Distributed Caching

At scale, a single cache server becomes a bottleneck and a single point of failure. Distributed caching solves this by spreading data across multiple cache nodes.

Redis

Redis is the most widely used distributed cache. It is a single-threaded, in-memory data structure store supporting strings, hashes, lists, sets, sorted sets, bitmaps, and more. Key features:

Memcached

A simpler, multithreaded in-memory key-value store. Supports only string values (no rich data types). No persistence, no replication built-in. Clients handle sharding by hashing keys to nodes.

Memcached can be faster than Redis for simple get/set workloads on multi-core machines because of its multithreaded architecture. Redis is almost always preferred for new systems due to its richer feature set and built-in clustering.

Sharding a Distributed Cache

When the working set exceeds one cache server’s RAM, data must be sharded across nodes. The challenge: when you add or remove a node, which keys move? Naive modulo hashing (key % N) would reassign almost all keys on every topology change, causing a cache miss storm.

Consistent hashing solves this: keys and nodes are placed on a ring. Adding or removing a node only redistributes the keys adjacent to it — typically 1/N of total keys. Redis Cluster uses a variant (hash slots) for the same effect.

💡
Redis vs Memcached

Choose Redis for: sessions, leaderboards, rate limiting, pub/sub, geospatial queries, or anything that needs data types beyond strings. Choose Memcached for: extremely simple string caching at very high throughput on multi-core hardware. In practice, Redis is the default for almost all new systems.

Caching Problems

Caching introduces failure modes that don’t exist in uncached systems. These three are classic interview topics.

Cache Stampede (Thundering Herd)

A popular cache entry expires. Simultaneously, hundreds of requests for that key arrive, all find a miss, and all go to the database at once. The database is suddenly hit with 100× its normal load for that key.

Solutions:

Cache Penetration

Requests for keys that don’t exist in the database. Every request misses the cache (there’s nothing to cache) and hits the database directly. An attacker can deliberately query non-existent keys to bypass the cache and overwhelm the database.

Solutions:

Cache Avalanche

A large number of cache entries expire at the same time (or the cache server crashes), causing a flood of DB requests all at once. Unlike cache stampede (one key), an avalanche affects many keys simultaneously.

Solutions:

ProblemCausePrimary mitigation
Cache stampedeOne popular key expires; many concurrent missesMutex lock, probabilistic early expiry, TTL jitter
Cache penetrationRequests for keys that never exist in DBCache null values, bloom filter
Cache avalancheMany keys expire at once or cache crashesTTL jitter, circuit breaker, cache HA
💡
In System Design Interviews

Mention caching early whenever you need to reduce DB load or latency. State the cache type (distributed, in-process, CDN), the write strategy (cache-aside is usually correct), the TTL rationale, and the eviction policy. Interviewers love to ask follow-ups about cache invalidation and what happens when the cache is down — have answers for cache stampede, penetration, and avalanche ready.