Case Studies

Design WhatsApp

● Advanced ⏱ 16 min read case-study

WhatsApp serves over 2 billion users, delivering 100 billion messages per day. Its defining properties — real-time delivery, end-to-end encryption, offline message queuing, and lightweight client design — make it a canonical system design problem. Building it touches WebSocket management, message ordering, delivery guarantees, distributed storage, and the specific challenges of group messaging at scale.

Requirements

Functional:

Non-functional:

Capacity Estimation

Daily active users: 500M
Messages per day: 100B
Messages per second: ~1.15M msg/s
Average message size: 100 bytes (text only)
Text message throughput: ~115 MB/s

Media messages: ~20% of total = 20B/day
Average media size: 500 KB
Media storage per day: 20B × 500KB = ~10 PB/day (compressed + deduplicated: ~1 PB/day)

WebSocket connections: 500M concurrent connections at peak
Chat servers needed: at 100K connections per server = 5,000 servers

Messaging Architecture

The core of a messaging system is the connection layer — maintaining persistent connections with clients and routing messages between them.

WebSocket gateway (chat servers): Each client maintains a persistent WebSocket connection to a chat server. A client sends a message by writing to its WebSocket; the server routes it to the recipient’s chat server; that server delivers to the recipient’s WebSocket.

Connection routing: With 5,000 chat servers, how does server A know which server holds the recipient’s WebSocket? A service registry (Redis or ZooKeeper) maps user ID → chat server. On connect, the client’s server registers the mapping; on disconnect, it deregisters.

Message flow:

  1. Sender writes message to their WebSocket connection (chat server A).
  2. Chat server A persists the message to the message store with status sent.
  3. Chat server A looks up the recipient’s server (chat server B) from the registry.
  4. Chat server A sends the message to chat server B via an internal RPC or message queue.
  5. Chat server B delivers the message to the recipient’s WebSocket and updates status to delivered.
  6. Chat server B sends a delivery acknowledgment back to the sender’s server, which notifies the sender’s client (single tick → double tick).
WhatsApp architecture: WebSocket gateways, message routing, offline queuing, and media storage

Delivery Guarantees

WhatsApp guarantees at-least-once delivery — a message is stored on the server until the recipient acknowledges receipt. The challenge is handling offline recipients.

Offline message queuing: If the recipient is offline (no WebSocket connection), the message is stored in a per-user message queue (backed by a database like Cassandra). When the user reconnects, the server delivers all queued messages and waits for client acknowledgment before deleting them.

Message IDs and ordering: Each message gets a globally unique ID (Snowflake-style: timestamp + server + sequence). Within a conversation, messages are ordered by client timestamp (with server timestamp as tiebreaker). Clients display messages in order; out-of-order delivery is resolved client-side.

Push notifications: For users who haven’t opened the app in a while, their device OS has killed the WebSocket connection. A push notification (APNs for iOS, FCM for Android) wakes the app, which reconnects and pulls queued messages. Push notifications carry a notification text but not the message content (which is end-to-end encrypted and can’t be read by the server).

💡
At-Least-Once vs Exactly-Once

WhatsApp uses at-least-once delivery with client-side deduplication. If a message is delivered but the ACK is lost, it gets redelivered — the client detects the duplicate message ID and discards it. Exactly-once delivery at the server layer is expensive; deduplication at the client is simpler and sufficient.

Group Chat

Group chats are fundamentally harder than 1:1 chats because one message must be delivered to N recipients.

Fan-out on write: When a message is sent to a group, the server creates N copies of the message (one per recipient) and places each in the recipient’s message queue. Delivery proceeds as normal per-user delivery. Simple and predictable, but expensive for large groups — a message in a 1,024-member group creates 1,024 writes.

Fan-out on read: Store the message once in a group message log. Each recipient tracks a cursor (last-read message ID). On reconnect, they pull new messages since their cursor. More storage-efficient but requires a more complex read path and can’t leverage per-user push efficiently.

WhatsApp uses fan-out on write for smaller groups (lower latency, simpler delivery tracking) and fan-out on read / hybrid approaches for very large groups. Group message storage uses a Cassandra-style model with (group_id, message_id) as the partition + clustering key — efficient range scans for message history.

Delivery receipts in groups: A message is “delivered” when all members have received it and “read” when any member has read it. Tracking per-member read status for 1,024 members per message is expensive. WhatsApp maintains a per-message delivery bitmap — one bit per group member, set when delivered and again when read.

Media Storage

Media (photos, videos, documents) is not sent through the chat server WebSocket — it’s too large and would clog the message pipe.

Upload flow:

  1. Client requests an upload URL from the media service.
  2. Client uploads the file directly to object storage (S3-equivalent) via the pre-signed URL.
  3. Client sends a text message with the media URL and metadata (MIME type, dimensions, duration) to the chat server.
  4. Recipient downloads the media from object storage when they open the message.

Deduplication: The same photo sent to multiple recipients is stored once. Content-addressed storage (SHA-256 hash of the file → storage key) deduplicates automatically. If the hash already exists in storage, the upload is skipped — only the metadata message is sent.

Media expiry: WhatsApp stores media for 30 days on its servers. After the recipient downloads it, the client stores it locally. This limits storage costs — media that’s never opened (group chat media, spam) doesn’t accumulate indefinitely.

Presence & Read Receipts

Presence (“Online” / “Last seen 5 minutes ago”) is a high-write, high-read feature that must be carefully isolated from the message path to avoid it becoming a bottleneck.

When a client connects or sends a heartbeat, its server updates a Redis key: presence:{user_id} = {timestamp, server_id} with a short TTL (60 seconds). Heartbeats refresh the TTL. If the TTL expires, the user is considered offline.

Presence data is fanned out to contacts who are online. Rather than pushing presence updates to all contacts immediately (expensive at scale), WhatsApp uses a subscription model: a client subscribes to presence updates for its contacts. The presence service publishes updates to subscribers. At 500M DAU, full presence fan-out would require billions of writes per second — WhatsApp limits presence visibility and updates to mitigate this.

End-to-End Encryption

WhatsApp uses the Signal Protocol for end-to-end encryption. Key properties:

Scaling Considerations