Design Netflix
Netflix serves 200 million subscribers in 190 countries, delivering 15% of global internet bandwidth during peak hours. Its technical story is one of the most documented in engineering — Netflix pioneered the microservices pattern, invented chaos engineering, and built Open Connect, one of the world’s largest private CDNs. Designing Netflix means designing for massive read throughput, global distribution, and graceful degradation when services fail.
Requirements
Functional:
- Upload and store video content (movies, TV episodes).
- Stream video to any device at adaptive quality (240p to 4K).
- Search and browse the content catalog.
- Personalized recommendations for each user.
- Continue watching across devices.
- Download for offline viewing.
- Multiple profiles per account.
Non-functional:
- High availability: a user who can’t play a video churns. Target 99.99% streaming availability.
- Low startup latency: video should begin playing within 2 seconds of pressing play.
- Adaptive bitrate: quality adjusts to available bandwidth without user interruption.
- Global: 190 countries, every device type, every network condition.
Capacity Estimation
Monthly active users: 200M
Peak concurrent streams: ~15M
Average stream bitrate: 5 Mbps (mix of SD/HD/4K)
Peak streaming bandwidth: 15M × 5 Mbps = 75 Tbps
Content library: ~36,000 titles
Encoded versions per title: ~1,200 (device types × resolutions × codecs × audio tracks)
Average encoded size per version: 5 GB
Total storage: 36,000 × 1,200 × 5 GB = ~216 PB
Video Ingestion & Transcoding
Before a video can be streamed, Netflix processes it through an extensive encoding pipeline. A single movie arrives as a high-quality master file (often several hundred GB) and must be transformed into thousands of versions for every device and network condition.
Transcoding pipeline:
- Ingest: Studio uploads the master file to Netflix’s ingest servers. The file is validated, checksummed, and stored in blob storage (S3).
- Scene analysis: Netflix’s proprietary encoding pipeline (Dynamic Optimizer) analyzes the video scene by scene. Complex scenes (action, high motion) get higher bitrates; simple scenes (talking heads, static backgrounds) get lower bitrates. This per-scene optimization reduces file size by 20–40% vs fixed bitrate encoding.
- Parallel transcoding: The video is split into chunks (typically 2–4 minutes each). Each chunk is transcoded in parallel across a cluster of worker machines. A 2-hour movie might produce ~60 chunks, each encoded into ~20 resolutions × 5 codecs = ~1,200 parallel jobs. Total encoding time: hours rather than days.
- Audio and subtitle tracks: Audio is encoded separately in multiple languages and formats (Dolby Atmos, stereo). Subtitle tracks are timed text files (WebVTT). All are stored alongside the video chunks.
- Quality control: Automated quality metrics (VMAF — Video Multimethod Assessment Fusion, Netflix’s own metric) check each encoded chunk. Chunks below threshold are re-encoded.
- Packaging: Video and audio chunks are packaged into streaming formats (DASH and HLS) and pushed to the CDN.
CDN Strategy
Netflix built Open Connect — a private CDN deployed inside ISP networks worldwide. Rather than routing video traffic over the public internet (expensive, slow), Netflix places its own servers inside Internet Service Providers’ data centers. When a Comcast subscriber in Chicago plays Stranger Things, the video comes from an Open Connect Appliance (OCA) inside Comcast’s Chicago network — zero hops over the open internet.
How Open Connect works:
- Appliances (OCAs): Netflix-operated servers with large storage (200+ TB) deployed in ISP facilities. Netflix ships and maintains the hardware; ISPs provide rack space and power in exchange for reduced peering costs.
- Proactive caching: Netflix predicts what users will watch next based on viewing patterns and time of day. Popular titles for each ISP region are pre-loaded onto that region’s OCAs nightly during off-peak hours. By the time users hit play, the content is already local.
- Steering service: When a client opens Netflix, the steering service (running on AWS) returns the optimal OCA IP for that client based on geographic proximity, current OCA health, and available content. The client then streams directly from the OCA — the AWS backend is not in the video path.
- Fallback: If the assigned OCA is unavailable or doesn’t have the requested content (rare for popular titles), the client is redirected to a different OCA or to Netflix’s backup CDN partners (Akamai, Limelight).
Streaming Protocol
Netflix uses DASH (Dynamic Adaptive Streaming over HTTP) for most platforms and HLS (HTTP Live Streaming) for Apple devices. Both work similarly:
- The video is divided into small segments (typically 4–6 seconds each).
- A manifest file (MPD for DASH, M3U8 for HLS) lists all available bitrate levels and the URLs of segments for each level.
- The client player downloads the manifest, then begins downloading segments. It monitors download speed and buffer level, choosing the bitrate level for the next segment based on available bandwidth.
- If bandwidth drops, the player switches to a lower bitrate for subsequent segments — the transition is seamless. If bandwidth improves, it steps up quality. This is Adaptive Bitrate Streaming (ABR).
Netflix’s client-side ABR algorithm (BOLA — Buffer Occupancy based Lyapunov Algorithm) optimizes for buffer occupancy, not just current download speed. It prefers keeping the buffer full even at lower quality over high quality with an empty buffer — because a buffer underrun (playback stall) is worse UX than a slightly lower resolution.
Recommendation System
Netflix’s recommendation engine drives 80% of content discovered on the platform. The recommendation system runs as a complex pipeline of models:
- Collaborative filtering: “Users like you also watched X.” Matrix factorization (SVD, ALS) decomposes the user-movie rating matrix into latent factor vectors. Users with similar vectors get similar recommendations. Computed in batch (Spark on AWS EMR) and updated periodically.
- Content-based filtering: “You liked Inception, so here’s another mind-bending thriller.” Based on content metadata (genre, director, cast, themes), encoded as feature vectors. Supplemented by Netflix’s own video tagging — human reviewers tag 500+ attributes per title (pace, tone, setting, time period).
- Session-based signals: What you watched in the last 30 minutes matters more than what you watched last year. A real-time feature pipeline (Kafka + Flink) processes viewing events and updates short-term preference signals.
- Personalized ranking: Even the homepage row order is personalized — “Trending Now” appears higher for users who respond to social proof; “Continue Watching” appears first for users who frequently resume. The final ranking model takes all signals and re-ranks the candidate set for each user.
- Artwork personalization: The thumbnail shown for a title is chosen per-user. A user who watches action movies sees an action scene thumbnail for a title that also has romance; a user who watches rom-coms sees a different thumbnail for the same title. A/B tested continuously.
Microservices Architecture
Netflix pioneered the microservices pattern at scale. Its backend runs 700+ microservices on AWS. Key architectural decisions:
- API Gateway (Zuul): All client requests go through Zuul, which handles authentication, routing, rate limiting, and A/B test assignment. Zuul is built on Netty for non-blocking I/O — it can handle millions of concurrent connections on a small number of nodes.
- Service discovery (Eureka): Netflix built Eureka, its own service registry. Services register on startup; Zuul and other services discover endpoints via Eureka. Designed for eventual consistency — a brief period with stale endpoints is acceptable; hard consistency would require too much coordination overhead.
- Resilience (Hystrix, Resilience4j): Every inter-service call is wrapped in a circuit breaker. Netflix pioneered the circuit breaker pattern at scale — they open-sourced Hystrix (now deprecated; succeeded by Resilience4j). Fallbacks are defined for every circuit — a failed recommendation service falls back to popular titles; a failed rating service falls back to no rating displayed.
- Chaos Engineering (Chaos Monkey): Netflix deliberately kills random EC2 instances in production to ensure the system can survive failures. Chaos Monkey became the foundation of chaos engineering as a discipline. If killing one instance brings down a feature, that’s a design problem to fix before a real outage exposes it.
Scaling Considerations
- Multi-region active-active on AWS. Netflix runs in multiple AWS regions simultaneously. Traffic is distributed by user geography. If one region degrades, users are shifted to another. Cross-region replication (Cassandra, its primary database) is asynchronous — eventual consistency is acceptable for viewing history, ratings, and profiles.
- EVCache for hot data. Netflix built EVCache — a distributed memcached cluster with cross-region replication. It sits in front of databases for high-frequency reads: user preferences, session state, and recommendation results. EVCache serves millions of requests per second with sub-millisecond latency.
- Cassandra for user data at scale. Viewing history, rating data, and account information are stored in Cassandra (Apache Cassandra, with Netflix contributing heavily to its development). Cassandra’s linear scalability and cross-datacenter replication match Netflix’s global, multi-region requirements.
- Data pipeline for offline processing. Netflix processes petabytes of event data daily on Spark clusters. This feeds model training, A/B test analysis, recommendation updates, and anomaly detection. The pipeline uses Kafka for event streaming → Spark for batch processing → data lake (Iceberg on S3) for storage.