Design Twitter
Twitter’s defining technical challenge is the home timeline — generating a personalized feed for each user from the tweets of everyone they follow. At 350 million monthly active users and 500 million tweets per day, the fan-out problem (one tweet must reach millions of followers) is one of the hardest distribution problems in social network design. The solution is a hybrid push-pull model that Twitter spent years evolving.
Requirements
Functional:
- Post tweets (text up to 280 chars, images, videos, links).
- Follow/unfollow other users.
- Home timeline: ordered feed of tweets from followed accounts.
- User timeline: all tweets by a specific user.
- Like, retweet, reply.
- Search tweets by keyword and hashtag.
- Trending topics.
- Notifications (mentions, replies, likes, follows).
Non-functional:
- Timeline reads must be fast — under 200ms p99 (users notice slow feeds).
- Eventual consistency is acceptable — a tweet appearing in feeds a few seconds after posting is fine.
- Write throughput: 6,000 tweets/second peak.
- Read:write ratio ~100:1 (reads dominate).
Capacity Estimation
Tweets per day: 500M
Tweets per second: ~6,000/s peak
Average tweet size: 300 bytes
Tweet storage per year: 500M × 300B × 365 = ~55 TB/year
Home timeline reads per second: ~600,000/s (100× write rate)
Followers per user (average): 200
Fan-out writes per tweet: 6,000 tweets/s × 200 followers = 1.2M writes/s to timeline caches
Tweet Storage
Tweets are stored in a distributed MySQL cluster (Twitter’s actual choice, sharded by tweet ID). The tweet table is minimal:
tweets (
tweet_id BIGINT PRIMARY KEY, -- Snowflake ID (timestamp-based)
user_id BIGINT NOT NULL,
content VARCHAR(280),
media_ids JSON, -- references to media objects
reply_to_id BIGINT, -- for threads
retweet_id BIGINT, -- for retweets
like_count BIGINT DEFAULT 0,
retweet_count BIGINT DEFAULT 0,
created_at TIMESTAMP
)
Sharding by tweet_id distributes write load evenly. The tweet ID is a Snowflake — a 64-bit integer with a millisecond timestamp in the high bits, making them time-sortable. Fetching tweets in reverse chronological order within a shard is a simple range scan on the primary key.
User timeline (all tweets by a user) is a separate index: (user_id, tweet_id DESC) on a user-sharded store. Fetching a user’s tweets is a range scan — trivial once sharded by user.
Home Timeline: The Fan-Out Problem
The home timeline is the hard problem. When you post a tweet, it must appear in the feeds of everyone who follows you. If you have 1 million followers, that’s 1 million timeline updates per tweet.
Fan-out on write (push model):
- User posts a tweet.
- A fan-out service looks up all followers of that user.
- The tweet ID is written to each follower’s timeline cache (a Redis sorted set, sorted by tweet_id).
- When a follower loads their home feed, they read from their pre-built timeline cache — very fast.
Timeline cache structure: home_timeline:{user_id} → sorted set of tweet IDs, capped at ~800 entries. Fetching the timeline is a ZREVRANGE — O(log N + K) where K is the number of results. Loading 20 tweets for the home feed is sub-millisecond.
The problem with pure fan-out on write: A user with 100 million followers (Barack Obama, Katy Perry) posting one tweet generates 100 million writes. At 6,000 tweets/second globally, with celebrity accounts following the same pattern, the fan-out service becomes the bottleneck. This is the hotkey problem — a single event generating disproportionate write amplification.
Hybrid Push-Pull
Twitter’s solution is a hybrid model based on follower count:
- Regular users (followers < ~10,000): Fan-out on write. Their tweets are pushed into all followers’ timeline caches immediately.
- Celebrity users (followers > ~10,000): No fan-out. Their tweets are NOT pre-pushed into follower caches. Instead, when a user opens their timeline, the timeline service fetches recent tweets from the celebrity’s user timeline in real time and merges them with the pre-built cache.
Timeline merge (read path):
- Fetch the user’s pre-built timeline cache (fan-out writes from followed regular users).
- Identify which followed accounts are celebrities (looked up from a celebrity set in Redis).
- For each celebrity, fetch their recent tweets from the user timeline store.
- Merge the lists in memory, sorted by tweet_id descending.
- Return the top N results.
This is more expensive per read than a pure cache lookup, but far cheaper than 100M fan-out writes. The trade-off: timeline construction is O(C) where C is the number of celebrities followed — typically a small number.
Search & Trending
Tweet search: Full-text search over tweets requires an inverted index. Twitter built Earlybird — a real-time Lucene-based search system. New tweets are indexed within seconds of posting. The index is sharded by time (recent tweets are in hot shards that fit in memory; older tweets are in cold shards on disk). Search queries fan out across time shards and merge results.
Trending topics: Computing trending hashtags and terms is a distributed counting problem. Each tweet is tokenized; a streaming system (Apache Storm, Kafka Streams) counts term frequencies over rolling time windows (last 1 hour, 24 hours). Trending = highest count relative to baseline frequency (prevents always-popular terms from dominating). Results are computed continuously and cached globally — trending topics are read-heavy but update every few minutes.
Notifications
Notification types: mentions (@user), replies, likes, retweets, follows, DMs. Each generates a notification record and triggers a delivery:
- In-app notifications: Stored in a notifications table per user, fetched on demand. Redis sorted set for unread count (fast badge update).
- Push notifications: Routed through APNs/FCM for mobile users who aren’t currently in the app. A notification fanout service subscribes to relevant events (like, retweet) and triggers push sends.
- Email digests: Batched and sent asynchronously. Users who haven’t opened the app in N days get a digest email summarizing activity. Generated by a batch job, not on the critical real-time path.
Scaling Considerations
- Timeline cache eviction policy. Timeline caches are capped at ~800 tweet IDs per user. When the cap is reached, oldest entries are evicted. This limits memory per user but means a user who scrolls back far enough will hit a cache miss and fall back to a database query. Twitter calls this “timeline reconstruction.”
- Inactive user optimization. Don’t maintain timeline caches for users who haven’t logged in recently. Skip fan-out writes for followers whose last-active timestamp is older than 30 days. When they return, reconstruct their timeline from scratch (a one-time cost to wake up the user, not a recurring cost).
- Retweet amplification. A retweet is a fan-out event identical to a regular tweet — except the tweet_id being fanned out is the original tweet’s ID. Viral content can cascade: a retweet by a celebrity causes a second fan-out of the same tweet to 100M users. Rate-limiting fan-out depth (number of hops) prevents runaway amplification.
- CDN for media. Images and videos are served from a CDN. The origin is object storage (Twitter uses HDFS + Blobstore internally). CDN edge nodes cache media globally, keeping bandwidth costs low and latency minimal for global users.
- FlockDB for social graph. Twitter open-sourced FlockDB, a distributed graph database optimized for storing and querying follow relationships. Queries like “who does user X follow?” and “who follows user X?” (needed for fan-out) are O(1) with sharded adjacency lists.