Problem #21: Design an E-Commerce Platform — Part 1 (Sections 1-10)
1. Problem Statement
Design a large-scale e-commerce platform (Amazon-like) supporting product catalog, search, cart, checkout, inventory management, order processing, and payment — handling millions of concurrent users, flash sales, and complex order pipelines.
Core business goals:
- Browse and search millions of products
- Add to cart, checkout, and pay seamlessly
- Track orders from placement to delivery
- Handle flash sales / peak traffic (Black Friday: 100x normal)
- Zero overselling (inventory accuracy is critical)
2. Functional Requirements
Core
| ID | Requirement |
|---|---|
| FR-1 | Product catalog (browse, filter, sort by category/price/rating) |
| FR-2 | Product search (full-text, faceted filtering) |
| FR-3 | Shopping cart (add/remove/update quantity) |
| FR-4 | Checkout flow (address, shipping, payment) |
| FR-5 | Inventory management (accurate stock counts, prevent overselling) |
| FR-6 | Order processing pipeline (placed → paid → shipped → delivered) |
| FR-7 | Payment integration (credit card, wallet, COD) |
| FR-8 | Order history and tracking |
| FR-9 | Product reviews and ratings |
Optional
| FR-10 | Recommendations ("customers also bought") | | FR-11 | Wishlist | | FR-12 | Seller dashboard (multi-vendor marketplace) | | FR-13 | Promotions / coupons | | FR-14 | Return / refund management |
3. Non-Functional Requirements
| Requirement | Target |
|---|---|
| Product page latency | < 200ms (p99) |
| Search latency | < 500ms |
| Checkout latency | < 2s |
| Inventory accuracy | 100% (zero overselling) |
| Availability | 99.99% |
| DAU | 50M |
| Peak concurrency | 10M (flash sales) |
| Orders/day | 10M |
| Products in catalog | 500M |
4. Capacity Estimation
text
Product views/day: 50M DAU * 20 views = 1B views
Search queries/day: 50M * 5 searches = 250M
Cart operations/day: 50M * 3 = 150M
Orders/day: 10M
Payments/day: 10M
Product page QPS: 1B / 86,400 ≈ 11,600/sec → Peak: 58,000/sec
Search QPS: 250M / 86,400 ≈ 2,900/sec → Peak: 14,500/sec
Order QPS: 10M / 86,400 ≈ 116/sec → Peak: 1,160/sec
Flash sale spike: 100x normal for 1 hour = 5.8M product views/sec
→ CDN + multi-level caching critical
Product catalog storage:
500M products * 5 KB avg = 2.5 TB
+ images: 500M * 10 images * 500 KB = 2.5 PB → S3 + CDN5. API Design
text
# Product catalog
GET /api/v1/products/{id}
GET /api/v1/products?category=electronics&sort=price_asc&page=1
GET /api/v1/search?q=laptop&brand=Dell&price_min=500&price_max=1500
# Cart
GET /api/v1/cart
POST /api/v1/cart/items { "product_id": "P123", "quantity": 2 }
PUT /api/v1/cart/items/{id} { "quantity": 3 }
DELETE /api/v1/cart/items/{id}
# Checkout (idempotent — uses idempotency_key)
POST /api/v1/orders
{
"idempotency_key": "order-uuid-123",
"cart_id": "cart-456",
"shipping_address_id": "addr-789",
"payment_method_id": "pm-101",
"coupon_code": "SAVE20"
}
→ { "order_id": "ORD-001", "status": "pending_payment", "total": 999.99 }
# Order tracking
GET /api/v1/orders/{order_id}
GET /api/v1/orders?status=shipped6. Data Model
sql
CREATE TABLE products (
product_id BIGINT PRIMARY KEY,
seller_id BIGINT NOT NULL,
name VARCHAR(500) NOT NULL,
description TEXT,
category_id BIGINT,
price DECIMAL(12,2) NOT NULL,
currency VARCHAR(3) DEFAULT 'USD',
rating_avg DECIMAL(3,2),
rating_count INT DEFAULT 0,
status VARCHAR(20) DEFAULT 'active',
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE inventory (
product_id BIGINT PRIMARY KEY REFERENCES products(product_id),
available_qty INT NOT NULL DEFAULT 0,
reserved_qty INT NOT NULL DEFAULT 0, -- reserved during checkout
warehouse_id BIGINT,
version INT DEFAULT 0, -- optimistic locking
updated_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE orders (
order_id BIGINT PRIMARY KEY,
user_id BIGINT NOT NULL,
status VARCHAR(20) NOT NULL,
subtotal DECIMAL(12,2),
tax DECIMAL(12,2),
shipping_fee DECIMAL(12,2),
total DECIMAL(12,2),
shipping_address JSONB,
payment_id BIGINT,
idempotency_key VARCHAR(100) UNIQUE, -- prevent duplicate orders
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE order_items (
order_item_id BIGINT PRIMARY KEY,
order_id BIGINT REFERENCES orders(order_id),
product_id BIGINT,
quantity INT NOT NULL,
unit_price DECIMAL(12,2) NOT NULL,
subtotal DECIMAL(12,2) NOT NULL
);7. High-Level Design
Microservice Architecture
| Service | Responsibility |
|---|---|
| Product Service | Catalog CRUD, categories, pricing |
| Search Service | Full-text search with faceted filtering (Elasticsearch) |
| Cart Service | Cart management (Redis-backed for speed) |
| Order Service | Order lifecycle management (state machine) |
| Inventory Service | Stock management, reservation, deduction |
| Payment Service | Payment processing, refunds |
| Notification Service | Order confirmations, shipping updates |
| User Service | Authentication, profiles, addresses |
Checkout Flow (The Critical Path)
The checkout is the most important flow. It must be:
- Atomic: Either the entire order succeeds or nothing happens
- Idempotent: Duplicate submissions don't create duplicate orders
- Fast: < 2 seconds total
But it spans MULTIPLE services (cart, inventory, payment, order) → distributed transaction problem.
Saga Pattern for Checkout
text
Step 1: Validate cart + compute total
Step 2: Reserve inventory (optimistic locking)
Step 3: Process payment
Step 4: Create order record
Step 5: Confirm inventory deduction
Step 6: Send confirmation notification
If Step 3 fails (payment declined):
→ Compensate Step 2: Release inventory reservation
If Step 4 fails (order DB error):
→ Compensate Step 3: Refund payment
→ Compensate Step 2: Release inventory
This is the SAGA PATTERN — a sequence of local transactions
with compensating actions for rollback.8. Mermaid Diagrams
8.1 System Architecture
8.2 Checkout Saga Flow
8.3 Saga Compensation (Payment Fails)
8.4 Order State Machine
8.5 Inventory Reservation Pattern
9. Deep Dive
9.1 Inventory Management (Preventing Overselling)
text
THE most critical correctness requirement in e-commerce.
Problem: 2 users checkout the last item simultaneously → both succeed → oversold!
Solution: Optimistic locking with version check
SQL:
UPDATE inventory
SET available_qty = available_qty - @quantity,
reserved_qty = reserved_qty + @quantity,
version = version + 1
WHERE product_id = @product_id
AND available_qty >= @quantity
AND version = @expected_version;
IF @@ROWCOUNT = 0 → insufficient stock or version conflict → RETRY or FAIL
Why optimistic locking (not pessimistic)?
- SELECT FOR UPDATE (pessimistic) locks the row → serializes ALL checkouts for that product
- At 1000 concurrent checkouts for a flash sale item, this creates a bottleneck
- Optimistic: most checkouts succeed on first try (no contention)
- Only retries on actual conflict (~1-5% during normal traffic)
Flash sale alternative: Pre-allocate inventory tokens
1. Before sale: Create 1000 tokens in Redis (SET inventory:ITEM:tokens → 1000)
2. At checkout: DECR inventory:ITEM:tokens → if >= 0, proceed; else, sold out
3. Much faster than DB update (Redis DECR is atomic, sub-ms)
4. Reconcile with DB async9.2 Cart Design
text
Cart stored in Redis (not DB):
Key: cart:{user_id}
Value: Hash {
"P123": {"quantity": 2, "price": 49.99, "added_at": epoch},
"P456": {"quantity": 1, "price": 299.99, "added_at": epoch}
}
TTL: 30 days (abandoned cart cleanup)
Why Redis for cart?
- Fast reads/writes (cart is accessed frequently)
- TTL for auto-cleanup
- No need for ACID (cart is soft state)
- If Redis dies, user re-adds items (acceptable)
Why NOT a database?
- 50M users * 3 cart ops/day = 150M DB writes/day for carts alone
- Carts are ephemeral — most are abandoned
- DB cost for temporary data is wasteful9.3 Search Architecture
text
Elasticsearch for product search:
500M products indexed
Faceted filtering: brand, category, price range, rating, seller
Full-text: product name + description
Autocomplete: edge-n-gram tokenizer
Query example: "laptop under $1000 with SSD"
→ ES bool query:
must: match("laptop SSD")
filter: range(price, lte: 1000)
sort: [_score, rating_avg DESC]
Ranking: BM25 text relevance * popularity boost * recency boost10. Bottlenecks and Trade-offs
| Decision | Trade-off |
|---|---|
| Saga vs 2PC | Saga: eventual consistency, compensable. 2PC: strong consistency, slower, coordinator SPOF |
| Optimistic vs pessimistic locking | Optimistic: better throughput, retries on conflict. Pessimistic: serializes, guaranteed |
| Cart in Redis vs DB | Redis: fast but volatile. DB: durable but slower for ephemeral data |
| Microservices vs monolith | Micro: independent scaling, team autonomy. Mono: simpler transactions, less latency |
| Pre-computed prices vs dynamic | Pre: fast reads, stale risk. Dynamic: always correct, slower |