Problem #36: Design Uber/Lyft (Ride-Sharing Platform) — Full Deep Dive
1. Problem Statement
Design a ride-sharing platform that matches riders with nearby drivers in real-time, handles dynamic pricing, provides ETA estimation, tracks rides in progress, and processes payments — all at city-scale with millions of concurrent users.
2-3. Requirements
Core
| ID | Requirement |
|---|---|
| FR-1 | Rider requests a ride (pickup + destination) |
| FR-2 | Match rider with the best nearby available driver |
| FR-3 | Real-time driver location tracking (every 3-5 seconds) |
| FR-4 | ETA estimation (pickup ETA + trip ETA) |
| FR-5 | Dynamic pricing (surge during high demand) |
| FR-6 | Ride lifecycle (request → match → pickup → in-progress → complete) |
| FR-7 | Payment processing at trip end |
| FR-8 | Driver/rider ratings after trip |
Non-Functional
| Requirement | Target |
|---|---|
| Matching latency | < 10 seconds |
| Location update frequency | Every 3-5 seconds per active driver |
| Location freshness | < 5 seconds stale |
| Concurrent rides | 1M+ globally |
| Active drivers | 5M+ sending location updates |
| Availability | 99.99% |
4. Capacity Estimation
text
Active drivers: 5M (sending GPS every 4 seconds)
Location updates/sec: 5M / 4 = 1.25M updates/sec
Location update size: ~100 bytes (driver_id, lat, lng, timestamp, heading, speed)
Location bandwidth: 1.25M * 100 = 125 MB/sec
Ride requests: 10M rides/day = ~115 rides/sec
Matching: each request searches nearby drivers → geospatial query
ETA calculations: per ride request + ongoing navigation updates
Storage:
Trip history: 10M * 2 KB = 20 GB/day
Location traces: 5M drivers * 86400/4 updates * 100 bytes = 10.8 TB/day (raw)
→ Downsample to 30-sec intervals for history: ~360 GB/day5-8. Architecture
Location Tracking: The GeoHash Approach
text
Challenge: 5M drivers sending GPS every 4s = 1.25M updates/sec.
For each ride request: "Find available drivers within 3 km of rider."
Scanning 5M drivers per request is too slow.
Solution: GeoHash-based spatial indexing
GeoHash: Encodes lat/lng into a string where shared prefix = nearby.
(37.7749, -122.4194) → "9q8yyk" (San Francisco)
(37.7751, -122.4190) → "9q8yym" (1 block away — shares "9q8yy" prefix!)
(40.7128, -74.0060) → "dr5reg" (New York — completely different prefix)
GeoHash precision:
4 chars → ~40 km × 20 km grid cell
5 chars → ~5 km × 5 km grid cell ← good for initial search
6 chars → ~1.2 km × 600m grid cell ← good for nearby matching
7 chars → ~150m × 150m grid cell
Redis implementation:
GEOADD drivers:{city} lng lat driver_id // Add/update driver position
GEORADIUS drivers:{city} lng lat 3 km // Find drivers within 3 km
→ O(log(N) + M) where M = results. Fast for millions of entries.
Alternative: QuadTree or S2 Geometry (Google's approach).
S2 uses hierarchical cell decomposition of the sphere.
Better for planet-scale (handles poles, anti-meridian correctly).Matching Algorithm
Matching Scoring Function
text
Score = w1 * (1 / ETA_minutes) + // Closer is better
w2 * driver_rating + // Higher rated preferred
w3 * acceptance_rate + // Reliable drivers preferred
w4 * (1 / detour_factor) // Minimize driver's detour
Weights tuned per market.
Fallback cascade:
Round 1: Search 3 km radius, top 5 candidates
Round 2 (no accept in 15s): Expand to 5 km, next 5 candidates
Round 3 (still no accept): Expand to 8 km, offer to more drivers
Round 4: "No drivers available. Try again in a few minutes."9. Deep Dive: Dynamic Pricing (Surge)
text
Why surge pricing?
Supply: 1000 available drivers in downtown
Demand: 3000 ride requests in downtown
Without surge: 2000 riders get "no drivers available" → terrible UX
With surge: price goes up → some riders wait → some drivers come to area → equilibrium
Surge calculation:
For each geo-cell (GeoHash level 5, ~5km × 5km):
demand = ride_requests_last_5_min / historical_average
supply = available_drivers / historical_average
surge_multiplier = demand / supply
if surge < 1.0 → no surge (supply exceeds demand)
if surge 1.0-1.5 → mild surge
if surge 1.5-3.0 → moderate surge
if surge > 3.0 → cap at 3.0x (or market-specific cap)
Smoothing:
Don't spike instantly (one burst of requests → temporary spike)
Use exponentially weighted moving average (EWMA) over 5-minute windows
Ramp up gradually, ramp down more slowly (hysteresis)
Geo-smoothing:
Adjacent cells should have similar surge (no sharp boundaries)
Average with neighboring cells for smooth pricing gradientRide Lifecycle State Machine
text
┌──────────────┐
│ REQUESTED │
└──────┬───────┘
│ Driver matched
┌──────▼───────┐
┌─────│ MATCHED │─────┐
│ └──────┬───────┘ │
│ │ Driver │ Driver cancels
│ │ arrives │ or timeout
│ ┌──────▼───────┐ │
│ │ ARRIVED │ │
│ └──────┬───────┘ │
│ │ Rider │
│ │ picked up │
│ ┌──────▼───────┐ │
│ │ IN_PROGRESS │ ├──► CANCELLED
│ └──────┬───────┘ │
│ │ Arrived │
│ │ at dest │
│ ┌──────▼───────┐ │
│ │ COMPLETED │ │
│ └──────┬───────┘ │
│ │ │
│ ┌──────▼───────┐ │
└────►│ PAYMENT │◄────┘
└──────┬───────┘
│
┌──────▼───────┐
│ RATED │
└──────────────┘
Each state transition:
→ Persisted to DB (strong consistency)
→ Published to Kafka (for analytics, notifications, tracking)
→ Push notification to rider and/or driver10-14. Trade-offs, Failures, Interview
Key trade-offs:
| Decision | Trade-off |
|---|---|
| GeoHash vs QuadTree vs S2 | GeoHash: simple, Redis-native. QuadTree: better density-aware. S2: best for planet-scale |
| Matching: nearest vs scored | Nearest: fastest pickup. Scored: considers rating, acceptance, fairness. Scored is better UX |
| Surge: reactive vs predictive | Reactive: responds to current demand. Predictive (ML): anticipates events/weather |
| Location in Redis vs dedicated DB | Redis: fast, in-memory. Dedicated (PostGIS): richer queries, more features. Redis for real-time, PostGIS for analytics |
Failure scenarios:
- Driver app crashes mid-ride → rider still tracked via last known location. Ride auto-completes after timeout. Safety alert if no update for 5 min.
- Matching service down → ride requests queue in Kafka → resume when service recovers. Rider sees "Finding driver..." longer.
- Payment fails → trip still completes. Retry payment async. If persistent failure → flag account.
Interview tips — the 5 pillars:
- Location tracking at scale (1.25M updates/sec → Redis GeoHash)
- Matching algorithm (geo search → score → offer → fallback cascade)
- Dynamic pricing (supply/demand ratio per geo-cell, EWMA smoothing)
- Ride state machine (each transition persisted + event published)
- ETA estimation (graph-based routing + ML for traffic prediction)
Practice Mode
5 Questions
- "How do you track millions of driver locations in real-time?" → Redis GEOADD/GEORADIUS with GeoHash indexing. 1.25M updates/sec. Each city is a separate Redis key. GEORADIUS finds drivers within X km in O(log N + M).
- "How does the matching algorithm work?" → Search nearby drivers (3km) → rank by score (ETA, rating, acceptance rate) → offer to top driver (15s timeout) → fallback cascade expanding radius if no accept.
- "How does surge pricing work?" → Per geo-cell: demand/supply ratio over 5-min EWMA. Surge multiplier capped (e.g., 3x). Geo-smoothed across adjacent cells. Ramp up gradually, ramp down slowly.
- "What happens if a driver's phone dies mid-ride?" → Last known location cached. Ride stays in IN_PROGRESS. Safety alert if no GPS update for 5 min. Rider can report. Ride auto-completes at expected time + buffer.
- "How do you estimate ETA?" → Road network graph (OSM data). Dijkstra/A* for shortest path. Historical traffic data per road segment per time of day. ML model adjusts for live traffic, weather, events.