docs/System Design/15-youtube-netflix-part1

Problem #15: Design YouTube / Netflix — Full Deep Dive

1. Problem Statement

Design a video streaming platform where users upload, transcode, store, and stream video content at global scale with adaptive bitrate streaming.

YouTube focus: User-generated content, search/discovery, live streaming Netflix focus: Licensed/original content, recommendation engine, offline download

Core business goals:

Upload and process video content (YouTube: millions of uploads/day)
Stream video to billions of viewers with minimal buffering
Adaptive bitrate: adjust quality based on user's bandwidth
Global delivery via CDN
Recommendation engine to maximize engagement

2-3. Requirements

Core

ID	Requirement
FR-1	Video upload (up to 12 hours for YouTube, varies for Netflix)
FR-2	Video transcoding to multiple resolutions and codecs
FR-3	Adaptive bitrate streaming (HLS/DASH)
FR-4	Video search and discovery
FR-5	Video player with seeking, pause, resume
FR-6	View count tracking
FR-7	Like/dislike, comments
FR-8	Recommendation engine
FR-9	Subscription/channel management

Non-Functional

Requirement	Target
DAU	2B (YouTube), 200M (Netflix)
Videos watched/day	1B hours (YouTube)
Uploads/day	500 hours of video per minute (YouTube)
Start latency	< 2 seconds
Buffering ratio	< 1% of playback time
Storage	Exabytes
Availability	99.99%

4. Capacity Estimation

text

YouTube scale:
  Video uploads: 500 hours/minute = 720,000 hours/day
  Average video: 10 minutes, 1 GB raw → transcoded to ~500 MB (multiple formats)
  Upload storage/day: 720,000 hours * 6 GB/hour = 4.3 PB/day raw
  After transcoding (10+ variants): 4.3 * 5 = ~21 PB/day
  Annual: 7.7 EB/year

  Video streams: 1B hours/day
  Average bitrate: 5 Mbps
  Peak concurrent viewers: ~50M
  Peak bandwidth: 50M * 5 Mbps = 250 Tbps (served from CDN)

5-6. Key Design: Video Pipeline

Video Upload & Transcoding Pipeline

Transcoding Details

text

Input: 1 raw video file (e.g., 1080p H.264, 10 minutes, 1 GB)

Output: Multiple variants for adaptive streaming
  240p  → 400 Kbps  (mobile, poor connection)
  360p  → 800 Kbps  (mobile, average)
  480p  → 1.5 Mbps  (standard)
  720p  → 3 Mbps    (HD)
  1080p → 6 Mbps    (Full HD)
  4K    → 15 Mbps   (Ultra HD, only for premium)

Each variant is split into 4-10 second segments:
  video_segment_001.ts, video_segment_002.ts, ...

Why segments?
  - Adaptive bitrate switching happens at segment boundaries
  - User's bandwidth drops → player switches to lower quality at next segment
  - Enables seeking without downloading entire file
  - CDN caches individual segments (popular segments stay hot)

Manifest file (HLS .m3u8):
  #EXTM3U
  #EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360
  360p/playlist.m3u8
  #EXT-X-STREAM-INF:BANDWIDTH=3000000,RESOLUTION=1280x720
  720p/playlist.m3u8
  #EXT-X-STREAM-INF:BANDWIDTH=6000000,RESOLUTION=1920x1080
  1080p/playlist.m3u8

Adaptive Bitrate Streaming (ABR)

7-8. System Architecture

9. Deep Dive

CDN Architecture for Video

text

3-tier CDN:
  Tier 1: Edge PoPs (200+ locations, closest to users)
    - Cache popular segments (last 24h trending videos)
    - Cache hit rate: ~90%

  Tier 2: Regional caches (30+ locations)
    - Cache longer-tail content
    - Serves edge misses
    - Cache hit rate: ~95% cumulative

  Tier 3: Origin shield (2-3 locations)
    - Protects origin storage from thundering herd
    - Single point for cache fill
    - Cache hit rate: ~99% cumulative

Origin: S3 / GCS (stores ALL video segments)
  - Only ~1% of requests reach origin

View Count at Scale (YouTube Problem)

text

YouTube view count: billions of views/day
Same as Distributed Counter (Problem #4)!

But YouTube adds anti-fraud:
  - Same user watching same video 10 times → count as 1 view
  - Bot detection: CAPTCHAs, behavioral analysis
  - View count freezes at 301 to verify authenticity (legacy behavior)

Architecture:
  Real-time: Sharded Redis counter (approximate, shown to user)
  Batch: MapReduce job reconciles real count daily (official count)

Netflix Recommendation Engine

text

Netflix's recommendation is responsible for 80% of content watched.

Architecture:
  Offline: Matrix factorization on viewing history → user/content embeddings
  Near-line: Update user profile based on recent watches (Kafka + Flink)
  Online: At request time, rank candidates using user embedding + content features

Key signals:
  - Watch history (most important)
  - Watch duration (completed vs abandoned)
  - Time of day, day of week
  - Device (TV vs phone → different content preferences)
  - Trending in region
  - Similar users' preferences (collaborative filtering)

10-14. Trade-offs, Failures, Scaling

Key trade-offs:

Decision	Trade-off
More transcoding variants	Better ABR experience, but more storage + compute
Longer segments (10s) vs shorter (2s)	Longer: fewer requests, better compression. Shorter: faster quality switching
CDN cache duration	Longer: better hit rate. Shorter: faster content updates
Live vs VOD	Live: no pre-transcoding possible, latency-critical. VOD: pre-transcode at leisure

Failure scenarios:

CDN edge failure → fallback to regional cache → origin shield → origin
Transcoding worker crash → Kafka redelivers job → at-least-once processing
Popular video launch (Super Bowl) → pre-warm CDN edges, over-provision

Scaling:

Video storage: Exabyte-scale blob storage (S3, GCS)
Transcoding: GPU clusters, auto-scaled by queue depth
CDN: 200+ PoPs, ISP-embedded caches (Netflix Open Connect)
Metadata: Sharded PostgreSQL, Elasticsearch for search

15-20. Interview Strategy

Key Points

Transcoding pipeline: raw → multiple resolutions → segmented → manifest
Adaptive bitrate: Player measures bandwidth, switches quality at segment boundaries
CDN architecture: 3-tier (edge → regional → origin shield → origin)
Segment-based streaming: 4-10s chunks enable seeking + ABR
View counting: Sharded counters + anti-fraud verification

Common Mistakes

Not explaining WHY video is segmented (ABR, seeking, CDN caching)
Forgetting about transcoding (it's the most compute-intensive part)
Proposing a single quality level (must support ABR)
Not mentioning CDN (can't stream exabytes from origin)
Ignoring the upload pipeline (YouTube processes 500 hours/minute)

Practice Mode

5 Questions

"Why are videos split into segments?" → Enables adaptive bitrate switching at segment boundaries, efficient CDN caching, and seeking without downloading the full file.
"How does adaptive bitrate streaming work?" → Player measures bandwidth, selects the highest quality that doesn't cause buffering, switches at segment boundaries.
"How does YouTube handle 500 hours of video uploads per minute?" → Distributed transcoding on GPU clusters. Jobs queued in Kafka. Each video transcoded to 6+ quality levels. Horizontal scaling of workers.
"Why does Netflix use a 3-tier CDN?" → Edge (closest, ~90% hit), regional (~95% cumulative), origin shield (~99%). Protects origin from thundering herd. Minimizes latency globally.
"How do you count views accurately at YouTube scale?" → Sharded counters for real-time approximate count. Batch MapReduce for official count with fraud filtering. Dedup by user+video combination.

1 "100x Scale" Challenge

From 50M peak concurrent viewers to 5B (global sports event). What changes? Hint: ISP-embedded caches (Netflix Open Connect), multicast-like P2P delivery, pre-position content at edge 24h before event, dynamic segment size (2s for live), dedicated live streaming edge infrastructure.

Previous← Facebook Newsfeed Part1 NextMessenger Part1 →