Problem #15: Design YouTube / Netflix — Full Deep Dive
1. Problem Statement
Design a video streaming platform where users upload, transcode, store, and stream video content at global scale with adaptive bitrate streaming.
YouTube focus: User-generated content, search/discovery, live streaming Netflix focus: Licensed/original content, recommendation engine, offline download
Core business goals:
- Upload and process video content (YouTube: millions of uploads/day)
- Stream video to billions of viewers with minimal buffering
- Adaptive bitrate: adjust quality based on user's bandwidth
- Global delivery via CDN
- Recommendation engine to maximize engagement
2-3. Requirements
Core
| ID | Requirement |
|---|---|
| FR-1 | Video upload (up to 12 hours for YouTube, varies for Netflix) |
| FR-2 | Video transcoding to multiple resolutions and codecs |
| FR-3 | Adaptive bitrate streaming (HLS/DASH) |
| FR-4 | Video search and discovery |
| FR-5 | Video player with seeking, pause, resume |
| FR-6 | View count tracking |
| FR-7 | Like/dislike, comments |
| FR-8 | Recommendation engine |
| FR-9 | Subscription/channel management |
Non-Functional
| Requirement | Target |
|---|---|
| DAU | 2B (YouTube), 200M (Netflix) |
| Videos watched/day | 1B hours (YouTube) |
| Uploads/day | 500 hours of video per minute (YouTube) |
| Start latency | < 2 seconds |
| Buffering ratio | < 1% of playback time |
| Storage | Exabytes |
| Availability | 99.99% |
4. Capacity Estimation
text
YouTube scale:
Video uploads: 500 hours/minute = 720,000 hours/day
Average video: 10 minutes, 1 GB raw → transcoded to ~500 MB (multiple formats)
Upload storage/day: 720,000 hours * 6 GB/hour = 4.3 PB/day raw
After transcoding (10+ variants): 4.3 * 5 = ~21 PB/day
Annual: 7.7 EB/year
Video streams: 1B hours/day
Average bitrate: 5 Mbps
Peak concurrent viewers: ~50M
Peak bandwidth: 50M * 5 Mbps = 250 Tbps (served from CDN)5-6. Key Design: Video Pipeline
Video Upload & Transcoding Pipeline
Transcoding Details
text
Input: 1 raw video file (e.g., 1080p H.264, 10 minutes, 1 GB)
Output: Multiple variants for adaptive streaming
240p → 400 Kbps (mobile, poor connection)
360p → 800 Kbps (mobile, average)
480p → 1.5 Mbps (standard)
720p → 3 Mbps (HD)
1080p → 6 Mbps (Full HD)
4K → 15 Mbps (Ultra HD, only for premium)
Each variant is split into 4-10 second segments:
video_segment_001.ts, video_segment_002.ts, ...
Why segments?
- Adaptive bitrate switching happens at segment boundaries
- User's bandwidth drops → player switches to lower quality at next segment
- Enables seeking without downloading entire file
- CDN caches individual segments (popular segments stay hot)
Manifest file (HLS .m3u8):
#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360
360p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=3000000,RESOLUTION=1280x720
720p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=6000000,RESOLUTION=1920x1080
1080p/playlist.m3u8Adaptive Bitrate Streaming (ABR)
7-8. System Architecture
9. Deep Dive
CDN Architecture for Video
text
3-tier CDN:
Tier 1: Edge PoPs (200+ locations, closest to users)
- Cache popular segments (last 24h trending videos)
- Cache hit rate: ~90%
Tier 2: Regional caches (30+ locations)
- Cache longer-tail content
- Serves edge misses
- Cache hit rate: ~95% cumulative
Tier 3: Origin shield (2-3 locations)
- Protects origin storage from thundering herd
- Single point for cache fill
- Cache hit rate: ~99% cumulative
Origin: S3 / GCS (stores ALL video segments)
- Only ~1% of requests reach originView Count at Scale (YouTube Problem)
text
YouTube view count: billions of views/day
Same as Distributed Counter (Problem #4)!
But YouTube adds anti-fraud:
- Same user watching same video 10 times → count as 1 view
- Bot detection: CAPTCHAs, behavioral analysis
- View count freezes at 301 to verify authenticity (legacy behavior)
Architecture:
Real-time: Sharded Redis counter (approximate, shown to user)
Batch: MapReduce job reconciles real count daily (official count)Netflix Recommendation Engine
text
Netflix's recommendation is responsible for 80% of content watched.
Architecture:
Offline: Matrix factorization on viewing history → user/content embeddings
Near-line: Update user profile based on recent watches (Kafka + Flink)
Online: At request time, rank candidates using user embedding + content features
Key signals:
- Watch history (most important)
- Watch duration (completed vs abandoned)
- Time of day, day of week
- Device (TV vs phone → different content preferences)
- Trending in region
- Similar users' preferences (collaborative filtering)10-14. Trade-offs, Failures, Scaling
Key trade-offs:
| Decision | Trade-off |
|---|---|
| More transcoding variants | Better ABR experience, but more storage + compute |
| Longer segments (10s) vs shorter (2s) | Longer: fewer requests, better compression. Shorter: faster quality switching |
| CDN cache duration | Longer: better hit rate. Shorter: faster content updates |
| Live vs VOD | Live: no pre-transcoding possible, latency-critical. VOD: pre-transcode at leisure |
Failure scenarios:
- CDN edge failure → fallback to regional cache → origin shield → origin
- Transcoding worker crash → Kafka redelivers job → at-least-once processing
- Popular video launch (Super Bowl) → pre-warm CDN edges, over-provision
Scaling:
- Video storage: Exabyte-scale blob storage (S3, GCS)
- Transcoding: GPU clusters, auto-scaled by queue depth
- CDN: 200+ PoPs, ISP-embedded caches (Netflix Open Connect)
- Metadata: Sharded PostgreSQL, Elasticsearch for search
15-20. Interview Strategy
Key Points
- Transcoding pipeline: raw → multiple resolutions → segmented → manifest
- Adaptive bitrate: Player measures bandwidth, switches quality at segment boundaries
- CDN architecture: 3-tier (edge → regional → origin shield → origin)
- Segment-based streaming: 4-10s chunks enable seeking + ABR
- View counting: Sharded counters + anti-fraud verification
Common Mistakes
- Not explaining WHY video is segmented (ABR, seeking, CDN caching)
- Forgetting about transcoding (it's the most compute-intensive part)
- Proposing a single quality level (must support ABR)
- Not mentioning CDN (can't stream exabytes from origin)
- Ignoring the upload pipeline (YouTube processes 500 hours/minute)
Practice Mode
5 Questions
- "Why are videos split into segments?" → Enables adaptive bitrate switching at segment boundaries, efficient CDN caching, and seeking without downloading the full file.
- "How does adaptive bitrate streaming work?" → Player measures bandwidth, selects the highest quality that doesn't cause buffering, switches at segment boundaries.
- "How does YouTube handle 500 hours of video uploads per minute?" → Distributed transcoding on GPU clusters. Jobs queued in Kafka. Each video transcoded to 6+ quality levels. Horizontal scaling of workers.
- "Why does Netflix use a 3-tier CDN?" → Edge (closest, ~90% hit), regional (~95% cumulative), origin shield (~99%). Protects origin from thundering herd. Minimizes latency globally.
- "How do you count views accurately at YouTube scale?" → Sharded counters for real-time approximate count. Batch MapReduce for official count with fraud filtering. Dedup by user+video combination.
1 "100x Scale" Challenge
From 50M peak concurrent viewers to 5B (global sports event). What changes? Hint: ISP-embedded caches (Netflix Open Connect), multicast-like P2P delivery, pre-position content at edge 24h before event, dynamic segment size (2s for live), dedicated live streaming edge infrastructure.