Case Studies

Design Twitter

● Advanced ⏱ 15 min read case-study

Twitter’s defining technical challenge is the home timeline — generating a personalized feed for each user from the tweets of everyone they follow. At 350 million monthly active users and 500 million tweets per day, the fan-out problem (one tweet must reach millions of followers) is one of the hardest distribution problems in social network design. The solution is a hybrid push-pull model that Twitter spent years evolving.

Requirements

Functional:

Non-functional:

Capacity Estimation

Tweets per day: 500M
Tweets per second: ~6,000/s peak
Average tweet size: 300 bytes
Tweet storage per year: 500M × 300B × 365 = ~55 TB/year

Home timeline reads per second: ~600,000/s (100× write rate)
Followers per user (average): 200
Fan-out writes per tweet: 6,000 tweets/s × 200 followers = 1.2M writes/s to timeline caches

Tweet Storage

Tweets are stored in a distributed MySQL cluster (Twitter’s actual choice, sharded by tweet ID). The tweet table is minimal:

tweets (
  tweet_id    BIGINT PRIMARY KEY,  -- Snowflake ID (timestamp-based)
  user_id     BIGINT NOT NULL,
  content     VARCHAR(280),
  media_ids   JSON,               -- references to media objects
  reply_to_id BIGINT,             -- for threads
  retweet_id  BIGINT,             -- for retweets
  like_count  BIGINT DEFAULT 0,
  retweet_count BIGINT DEFAULT 0,
  created_at  TIMESTAMP
)

Sharding by tweet_id distributes write load evenly. The tweet ID is a Snowflake — a 64-bit integer with a millisecond timestamp in the high bits, making them time-sortable. Fetching tweets in reverse chronological order within a shard is a simple range scan on the primary key.

User timeline (all tweets by a user) is a separate index: (user_id, tweet_id DESC) on a user-sharded store. Fetching a user’s tweets is a range scan — trivial once sharded by user.

Twitter architecture: tweet write path, fan-out service, timeline cache, and hybrid push-pull for celebrities

Home Timeline: The Fan-Out Problem

The home timeline is the hard problem. When you post a tweet, it must appear in the feeds of everyone who follows you. If you have 1 million followers, that’s 1 million timeline updates per tweet.

Fan-out on write (push model):

  1. User posts a tweet.
  2. A fan-out service looks up all followers of that user.
  3. The tweet ID is written to each follower’s timeline cache (a Redis sorted set, sorted by tweet_id).
  4. When a follower loads their home feed, they read from their pre-built timeline cache — very fast.

Timeline cache structure: home_timeline:{user_id} → sorted set of tweet IDs, capped at ~800 entries. Fetching the timeline is a ZREVRANGE — O(log N + K) where K is the number of results. Loading 20 tweets for the home feed is sub-millisecond.

The problem with pure fan-out on write: A user with 100 million followers (Barack Obama, Katy Perry) posting one tweet generates 100 million writes. At 6,000 tweets/second globally, with celebrity accounts following the same pattern, the fan-out service becomes the bottleneck. This is the hotkey problem — a single event generating disproportionate write amplification.

Hybrid Push-Pull

Twitter’s solution is a hybrid model based on follower count:

Timeline merge (read path):

  1. Fetch the user’s pre-built timeline cache (fan-out writes from followed regular users).
  2. Identify which followed accounts are celebrities (looked up from a celebrity set in Redis).
  3. For each celebrity, fetch their recent tweets from the user timeline store.
  4. Merge the lists in memory, sorted by tweet_id descending.
  5. Return the top N results.

This is more expensive per read than a pure cache lookup, but far cheaper than 100M fan-out writes. The trade-off: timeline construction is O(C) where C is the number of celebrities followed — typically a small number.

Tweet search: Full-text search over tweets requires an inverted index. Twitter built Earlybird — a real-time Lucene-based search system. New tweets are indexed within seconds of posting. The index is sharded by time (recent tweets are in hot shards that fit in memory; older tweets are in cold shards on disk). Search queries fan out across time shards and merge results.

Trending topics: Computing trending hashtags and terms is a distributed counting problem. Each tweet is tokenized; a streaming system (Apache Storm, Kafka Streams) counts term frequencies over rolling time windows (last 1 hour, 24 hours). Trending = highest count relative to baseline frequency (prevents always-popular terms from dominating). Results are computed continuously and cached globally — trending topics are read-heavy but update every few minutes.

Notifications

Notification types: mentions (@user), replies, likes, retweets, follows, DMs. Each generates a notification record and triggers a delivery:

Scaling Considerations