Case Studies

Design Twitter

● Advanced ⏱ 15 min read case-study

Twitter’s defining technical challenge is the home timeline — generating a personalized feed for each user from the tweets of everyone they follow. At 350 million monthly active users and 500 million tweets per day, the fan-out problem (one tweet must reach millions of followers) is one of the hardest distribution problems in social network design. The solution is a hybrid push-pull model that Twitter spent years evolving.

Requirements

Functional:

Post tweets (text up to 280 chars, images, videos, links).
Follow/unfollow other users.
Home timeline: ordered feed of tweets from followed accounts.
User timeline: all tweets by a specific user.
Like, retweet, reply.
Search tweets by keyword and hashtag.
Trending topics.
Notifications (mentions, replies, likes, follows).

Non-functional:

Timeline reads must be fast — under 200ms p99 (users notice slow feeds).
Eventual consistency is acceptable — a tweet appearing in feeds a few seconds after posting is fine.
Write throughput: 6,000 tweets/second peak.
Read:write ratio ~100:1 (reads dominate).

Capacity Estimation

Tweets per day: 500M
Tweets per second: ~6,000/s peak
Average tweet size: 300 bytes
Tweet storage per year: 500M × 300B × 365 = ~55 TB/year

Home timeline reads per second: ~600,000/s (100× write rate)
Followers per user (average): 200
Fan-out writes per tweet: 6,000 tweets/s × 200 followers = 1.2M writes/s to timeline caches

Tweet Storage

Tweets are stored in a distributed MySQL cluster (Twitter’s actual choice, sharded by tweet ID). The tweet table is minimal:

tweets (
  tweet_id    BIGINT PRIMARY KEY,  -- Snowflake ID (timestamp-based)
  user_id     BIGINT NOT NULL,
  content     VARCHAR(280),
  media_ids   JSON,               -- references to media objects
  reply_to_id BIGINT,             -- for threads
  retweet_id  BIGINT,             -- for retweets
  like_count  BIGINT DEFAULT 0,
  retweet_count BIGINT DEFAULT 0,
  created_at  TIMESTAMP
)

Sharding by tweet_id distributes write load evenly. The tweet ID is a Snowflake — a 64-bit integer with a millisecond timestamp in the high bits, making them time-sortable. Fetching tweets in reverse chronological order within a shard is a simple range scan on the primary key.

User timeline (all tweets by a user) is a separate index: (user_id, tweet_id DESC) on a user-sharded store. Fetching a user’s tweets is a range scan — trivial once sharded by user.

Twitter architecture: tweet write path, fan-out service, timeline cache, and hybrid push-pull for celebrities

Home Timeline: The Fan-Out Problem

The home timeline is the hard problem. When you post a tweet, it must appear in the feeds of everyone who follows you. If you have 1 million followers, that’s 1 million timeline updates per tweet.

Fan-out on write (push model):

User posts a tweet.
A fan-out service looks up all followers of that user.
The tweet ID is written to each follower’s timeline cache (a Redis sorted set, sorted by tweet_id).
When a follower loads their home feed, they read from their pre-built timeline cache — very fast.

Timeline cache structure: home_timeline:{user_id} → sorted set of tweet IDs, capped at ~800 entries. Fetching the timeline is a ZREVRANGE — O(log N + K) where K is the number of results. Loading 20 tweets for the home feed is sub-millisecond.

The problem with pure fan-out on write: A user with 100 million followers (Barack Obama, Katy Perry) posting one tweet generates 100 million writes. At 6,000 tweets/second globally, with celebrity accounts following the same pattern, the fan-out service becomes the bottleneck. This is the hotkey problem — a single event generating disproportionate write amplification.

Hybrid Push-Pull

Twitter’s solution is a hybrid model based on follower count:

Regular users (followers < ~10,000): Fan-out on write. Their tweets are pushed into all followers’ timeline caches immediately.
Celebrity users (followers > ~10,000): No fan-out. Their tweets are NOT pre-pushed into follower caches. Instead, when a user opens their timeline, the timeline service fetches recent tweets from the celebrity’s user timeline in real time and merges them with the pre-built cache.

Timeline merge (read path):

Fetch the user’s pre-built timeline cache (fan-out writes from followed regular users).
Identify which followed accounts are celebrities (looked up from a celebrity set in Redis).
For each celebrity, fetch their recent tweets from the user timeline store.
Merge the lists in memory, sorted by tweet_id descending.
Return the top N results.

This is more expensive per read than a pure cache lookup, but far cheaper than 100M fan-out writes. The trade-off: timeline construction is O(C) where C is the number of celebrities followed — typically a small number.

Search & Trending

Tweet search: Full-text search over tweets requires an inverted index. Twitter built Earlybird — a real-time Lucene-based search system. New tweets are indexed within seconds of posting. The index is sharded by time (recent tweets are in hot shards that fit in memory; older tweets are in cold shards on disk). Search queries fan out across time shards and merge results.

Trending topics: Computing trending hashtags and terms is a distributed counting problem. Each tweet is tokenized; a streaming system (Apache Storm, Kafka Streams) counts term frequencies over rolling time windows (last 1 hour, 24 hours). Trending = highest count relative to baseline frequency (prevents always-popular terms from dominating). Results are computed continuously and cached globally — trending topics are read-heavy but update every few minutes.

Notifications

Notification types: mentions (@user), replies, likes, retweets, follows, DMs. Each generates a notification record and triggers a delivery:

In-app notifications: Stored in a notifications table per user, fetched on demand. Redis sorted set for unread count (fast badge update).
Push notifications: Routed through APNs/FCM for mobile users who aren’t currently in the app. A notification fanout service subscribes to relevant events (like, retweet) and triggers push sends.
Email digests: Batched and sent asynchronously. Users who haven’t opened the app in N days get a digest email summarizing activity. Generated by a batch job, not on the critical real-time path.

Scaling Considerations

Timeline cache eviction policy. Timeline caches are capped at ~800 tweet IDs per user. When the cap is reached, oldest entries are evicted. This limits memory per user but means a user who scrolls back far enough will hit a cache miss and fall back to a database query. Twitter calls this “timeline reconstruction.”
Inactive user optimization. Don’t maintain timeline caches for users who haven’t logged in recently. Skip fan-out writes for followers whose last-active timestamp is older than 30 days. When they return, reconstruct their timeline from scratch (a one-time cost to wake up the user, not a recurring cost).
Retweet amplification. A retweet is a fan-out event identical to a regular tweet — except the tweet_id being fanned out is the original tweet’s ID. Viral content can cascade: a retweet by a celebrity causes a second fan-out of the same tweet to 100M users. Rate-limiting fan-out depth (number of hops) prevents runaway amplification.
CDN for media. Images and videos are served from a CDN. The origin is object storage (Twitter uses HDFS + Blobstore internally). CDN edge nodes cache media globally, keeping bandwidth costs low and latency minimal for global users.
FlockDB for social graph. Twitter open-sourced FlockDB, a distributed graph database optimized for storing and querying follow relationships. Queries like “who does user X follow?” and “who follows user X?” (needed for fan-out) are O(1) with sharded adjacency lists.