docs/System Design/15-youtube-netflix-part1
Edit on GitHub

Problem #15: Design YouTube / Netflix — Full Deep Dive


1. Problem Statement

Design a video streaming platform where users upload, transcode, store, and stream video content at global scale with adaptive bitrate streaming.

YouTube focus: User-generated content, search/discovery, live streaming Netflix focus: Licensed/original content, recommendation engine, offline download

Core business goals:

  • Upload and process video content (YouTube: millions of uploads/day)
  • Stream video to billions of viewers with minimal buffering
  • Adaptive bitrate: adjust quality based on user's bandwidth
  • Global delivery via CDN
  • Recommendation engine to maximize engagement

2-3. Requirements

Core

IDRequirement
FR-1Video upload (up to 12 hours for YouTube, varies for Netflix)
FR-2Video transcoding to multiple resolutions and codecs
FR-3Adaptive bitrate streaming (HLS/DASH)
FR-4Video search and discovery
FR-5Video player with seeking, pause, resume
FR-6View count tracking
FR-7Like/dislike, comments
FR-8Recommendation engine
FR-9Subscription/channel management

Non-Functional

RequirementTarget
DAU2B (YouTube), 200M (Netflix)
Videos watched/day1B hours (YouTube)
Uploads/day500 hours of video per minute (YouTube)
Start latency< 2 seconds
Buffering ratio< 1% of playback time
StorageExabytes
Availability99.99%

4. Capacity Estimation

text
YouTube scale:
  Video uploads: 500 hours/minute = 720,000 hours/day
  Average video: 10 minutes, 1 GB raw → transcoded to ~500 MB (multiple formats)
  Upload storage/day: 720,000 hours * 6 GB/hour = 4.3 PB/day raw
  After transcoding (10+ variants): 4.3 * 5 = ~21 PB/day
  Annual: 7.7 EB/year

  Video streams: 1B hours/day
  Average bitrate: 5 Mbps
  Peak concurrent viewers: ~50M
  Peak bandwidth: 50M * 5 Mbps = 250 Tbps (served from CDN)

5-6. Key Design: Video Pipeline

Video Upload & Transcoding Pipeline

Transcoding Details

text
Input: 1 raw video file (e.g., 1080p H.264, 10 minutes, 1 GB)

Output: Multiple variants for adaptive streaming
  240p  → 400 Kbps  (mobile, poor connection)
  360p  → 800 Kbps  (mobile, average)
  480p  → 1.5 Mbps  (standard)
  720p  → 3 Mbps    (HD)
  1080p → 6 Mbps    (Full HD)
  4K    → 15 Mbps   (Ultra HD, only for premium)

Each variant is split into 4-10 second segments:
  video_segment_001.ts, video_segment_002.ts, ...

Why segments?
  - Adaptive bitrate switching happens at segment boundaries
  - User's bandwidth drops → player switches to lower quality at next segment
  - Enables seeking without downloading entire file
  - CDN caches individual segments (popular segments stay hot)

Manifest file (HLS .m3u8):
  #EXTM3U
  #EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360
  360p/playlist.m3u8
  #EXT-X-STREAM-INF:BANDWIDTH=3000000,RESOLUTION=1280x720
  720p/playlist.m3u8
  #EXT-X-STREAM-INF:BANDWIDTH=6000000,RESOLUTION=1920x1080
  1080p/playlist.m3u8

Adaptive Bitrate Streaming (ABR)


7-8. System Architecture


9. Deep Dive

CDN Architecture for Video

text
3-tier CDN:
  Tier 1: Edge PoPs (200+ locations, closest to users)
    - Cache popular segments (last 24h trending videos)
    - Cache hit rate: ~90%

  Tier 2: Regional caches (30+ locations)
    - Cache longer-tail content
    - Serves edge misses
    - Cache hit rate: ~95% cumulative

  Tier 3: Origin shield (2-3 locations)
    - Protects origin storage from thundering herd
    - Single point for cache fill
    - Cache hit rate: ~99% cumulative

Origin: S3 / GCS (stores ALL video segments)
  - Only ~1% of requests reach origin

View Count at Scale (YouTube Problem)

text
YouTube view count: billions of views/day
Same as Distributed Counter (Problem #4)!

But YouTube adds anti-fraud:
  - Same user watching same video 10 times → count as 1 view
  - Bot detection: CAPTCHAs, behavioral analysis
  - View count freezes at 301 to verify authenticity (legacy behavior)

Architecture:
  Real-time: Sharded Redis counter (approximate, shown to user)
  Batch: MapReduce job reconciles real count daily (official count)

Netflix Recommendation Engine

text
Netflix's recommendation is responsible for 80% of content watched.

Architecture:
  Offline: Matrix factorization on viewing history → user/content embeddings
  Near-line: Update user profile based on recent watches (Kafka + Flink)
  Online: At request time, rank candidates using user embedding + content features

Key signals:
  - Watch history (most important)
  - Watch duration (completed vs abandoned)
  - Time of day, day of week
  - Device (TV vs phone → different content preferences)
  - Trending in region
  - Similar users' preferences (collaborative filtering)

10-14. Trade-offs, Failures, Scaling

Key trade-offs:

DecisionTrade-off
More transcoding variantsBetter ABR experience, but more storage + compute
Longer segments (10s) vs shorter (2s)Longer: fewer requests, better compression. Shorter: faster quality switching
CDN cache durationLonger: better hit rate. Shorter: faster content updates
Live vs VODLive: no pre-transcoding possible, latency-critical. VOD: pre-transcode at leisure

Failure scenarios:

  • CDN edge failure → fallback to regional cache → origin shield → origin
  • Transcoding worker crash → Kafka redelivers job → at-least-once processing
  • Popular video launch (Super Bowl) → pre-warm CDN edges, over-provision

Scaling:

  • Video storage: Exabyte-scale blob storage (S3, GCS)
  • Transcoding: GPU clusters, auto-scaled by queue depth
  • CDN: 200+ PoPs, ISP-embedded caches (Netflix Open Connect)
  • Metadata: Sharded PostgreSQL, Elasticsearch for search

15-20. Interview Strategy

Key Points

  1. Transcoding pipeline: raw → multiple resolutions → segmented → manifest
  2. Adaptive bitrate: Player measures bandwidth, switches quality at segment boundaries
  3. CDN architecture: 3-tier (edge → regional → origin shield → origin)
  4. Segment-based streaming: 4-10s chunks enable seeking + ABR
  5. View counting: Sharded counters + anti-fraud verification

Common Mistakes

  1. Not explaining WHY video is segmented (ABR, seeking, CDN caching)
  2. Forgetting about transcoding (it's the most compute-intensive part)
  3. Proposing a single quality level (must support ABR)
  4. Not mentioning CDN (can't stream exabytes from origin)
  5. Ignoring the upload pipeline (YouTube processes 500 hours/minute)

Practice Mode

5 Questions

  1. "Why are videos split into segments?" → Enables adaptive bitrate switching at segment boundaries, efficient CDN caching, and seeking without downloading the full file.
  2. "How does adaptive bitrate streaming work?" → Player measures bandwidth, selects the highest quality that doesn't cause buffering, switches at segment boundaries.
  3. "How does YouTube handle 500 hours of video uploads per minute?" → Distributed transcoding on GPU clusters. Jobs queued in Kafka. Each video transcoded to 6+ quality levels. Horizontal scaling of workers.
  4. "Why does Netflix use a 3-tier CDN?" → Edge (closest, ~90% hit), regional (~95% cumulative), origin shield (~99%). Protects origin from thundering herd. Minimizes latency globally.
  5. "How do you count views accurately at YouTube scale?" → Sharded counters for real-time approximate count. Batch MapReduce for official count with fraud filtering. Dedup by user+video combination.

1 "100x Scale" Challenge

From 50M peak concurrent viewers to 5B (global sports event). What changes? Hint: ISP-embedded caches (Netflix Open Connect), multicast-like P2P delivery, pre-position content at edge 24h before event, dynamic segment size (2s for live), dedicated live streaming edge infrastructure.