docs/System Design/21-ecommerce-part1
Edit on GitHub

Problem #21: Design an E-Commerce Platform — Part 1 (Sections 1-10)


1. Problem Statement

Design a large-scale e-commerce platform (Amazon-like) supporting product catalog, search, cart, checkout, inventory management, order processing, and payment — handling millions of concurrent users, flash sales, and complex order pipelines.

Core business goals:

  • Browse and search millions of products
  • Add to cart, checkout, and pay seamlessly
  • Track orders from placement to delivery
  • Handle flash sales / peak traffic (Black Friday: 100x normal)
  • Zero overselling (inventory accuracy is critical)

2. Functional Requirements

Core

IDRequirement
FR-1Product catalog (browse, filter, sort by category/price/rating)
FR-2Product search (full-text, faceted filtering)
FR-3Shopping cart (add/remove/update quantity)
FR-4Checkout flow (address, shipping, payment)
FR-5Inventory management (accurate stock counts, prevent overselling)
FR-6Order processing pipeline (placed → paid → shipped → delivered)
FR-7Payment integration (credit card, wallet, COD)
FR-8Order history and tracking
FR-9Product reviews and ratings

Optional

| FR-10 | Recommendations ("customers also bought") | | FR-11 | Wishlist | | FR-12 | Seller dashboard (multi-vendor marketplace) | | FR-13 | Promotions / coupons | | FR-14 | Return / refund management |


3. Non-Functional Requirements

RequirementTarget
Product page latency< 200ms (p99)
Search latency< 500ms
Checkout latency< 2s
Inventory accuracy100% (zero overselling)
Availability99.99%
DAU50M
Peak concurrency10M (flash sales)
Orders/day10M
Products in catalog500M

4. Capacity Estimation

text
Product views/day: 50M DAU * 20 views = 1B views
Search queries/day: 50M * 5 searches = 250M
Cart operations/day: 50M * 3 = 150M
Orders/day: 10M
Payments/day: 10M

Product page QPS: 1B / 86,400 ≈ 11,600/sec → Peak: 58,000/sec
Search QPS: 250M / 86,400 ≈ 2,900/sec → Peak: 14,500/sec
Order QPS: 10M / 86,400 ≈ 116/sec → Peak: 1,160/sec

Flash sale spike: 100x normal for 1 hour = 5.8M product views/sec
  → CDN + multi-level caching critical

Product catalog storage:
  500M products * 5 KB avg = 2.5 TB
  + images: 500M * 10 images * 500 KB = 2.5 PB → S3 + CDN

5. API Design

text
# Product catalog
GET  /api/v1/products/{id}
GET  /api/v1/products?category=electronics&sort=price_asc&page=1
GET  /api/v1/search?q=laptop&brand=Dell&price_min=500&price_max=1500

# Cart
GET    /api/v1/cart
POST   /api/v1/cart/items      { "product_id": "P123", "quantity": 2 }
PUT    /api/v1/cart/items/{id} { "quantity": 3 }
DELETE /api/v1/cart/items/{id}

# Checkout (idempotent — uses idempotency_key)
POST /api/v1/orders
{
  "idempotency_key": "order-uuid-123",
  "cart_id": "cart-456",
  "shipping_address_id": "addr-789",
  "payment_method_id": "pm-101",
  "coupon_code": "SAVE20"
}
→ { "order_id": "ORD-001", "status": "pending_payment", "total": 999.99 }

# Order tracking
GET /api/v1/orders/{order_id}
GET /api/v1/orders?status=shipped

6. Data Model

sql
CREATE TABLE products (
    product_id   BIGINT PRIMARY KEY,
    seller_id    BIGINT NOT NULL,
    name         VARCHAR(500) NOT NULL,
    description  TEXT,
    category_id  BIGINT,
    price        DECIMAL(12,2) NOT NULL,
    currency     VARCHAR(3) DEFAULT 'USD',
    rating_avg   DECIMAL(3,2),
    rating_count INT DEFAULT 0,
    status       VARCHAR(20) DEFAULT 'active',
    created_at   TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE inventory (
    product_id       BIGINT PRIMARY KEY REFERENCES products(product_id),
    available_qty    INT NOT NULL DEFAULT 0,
    reserved_qty     INT NOT NULL DEFAULT 0,  -- reserved during checkout
    warehouse_id     BIGINT,
    version          INT DEFAULT 0,           -- optimistic locking
    updated_at       TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE orders (
    order_id         BIGINT PRIMARY KEY,
    user_id          BIGINT NOT NULL,
    status           VARCHAR(20) NOT NULL,
    subtotal         DECIMAL(12,2),
    tax              DECIMAL(12,2),
    shipping_fee     DECIMAL(12,2),
    total            DECIMAL(12,2),
    shipping_address JSONB,
    payment_id       BIGINT,
    idempotency_key  VARCHAR(100) UNIQUE,     -- prevent duplicate orders
    created_at       TIMESTAMPTZ DEFAULT NOW(),
    updated_at       TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE order_items (
    order_item_id    BIGINT PRIMARY KEY,
    order_id         BIGINT REFERENCES orders(order_id),
    product_id       BIGINT,
    quantity         INT NOT NULL,
    unit_price       DECIMAL(12,2) NOT NULL,
    subtotal         DECIMAL(12,2) NOT NULL
);

7. High-Level Design

Microservice Architecture

ServiceResponsibility
Product ServiceCatalog CRUD, categories, pricing
Search ServiceFull-text search with faceted filtering (Elasticsearch)
Cart ServiceCart management (Redis-backed for speed)
Order ServiceOrder lifecycle management (state machine)
Inventory ServiceStock management, reservation, deduction
Payment ServicePayment processing, refunds
Notification ServiceOrder confirmations, shipping updates
User ServiceAuthentication, profiles, addresses

Checkout Flow (The Critical Path)

The checkout is the most important flow. It must be:

  • Atomic: Either the entire order succeeds or nothing happens
  • Idempotent: Duplicate submissions don't create duplicate orders
  • Fast: < 2 seconds total

But it spans MULTIPLE services (cart, inventory, payment, order) → distributed transaction problem.

Saga Pattern for Checkout

text
Step 1: Validate cart + compute total
Step 2: Reserve inventory (optimistic locking)
Step 3: Process payment
Step 4: Create order record
Step 5: Confirm inventory deduction
Step 6: Send confirmation notification

If Step 3 fails (payment declined):
  → Compensate Step 2: Release inventory reservation

If Step 4 fails (order DB error):
  → Compensate Step 3: Refund payment
  → Compensate Step 2: Release inventory

This is the SAGA PATTERN — a sequence of local transactions
with compensating actions for rollback.

8. Mermaid Diagrams

8.1 System Architecture

8.2 Checkout Saga Flow

8.3 Saga Compensation (Payment Fails)

8.4 Order State Machine

8.5 Inventory Reservation Pattern


9. Deep Dive

9.1 Inventory Management (Preventing Overselling)

text
THE most critical correctness requirement in e-commerce.

Problem: 2 users checkout the last item simultaneously → both succeed → oversold!

Solution: Optimistic locking with version check

SQL:
  UPDATE inventory
  SET available_qty = available_qty - @quantity,
      reserved_qty = reserved_qty + @quantity,
      version = version + 1
  WHERE product_id = @product_id
    AND available_qty >= @quantity
    AND version = @expected_version;

  IF @@ROWCOUNT = 0 → insufficient stock or version conflict → RETRY or FAIL

Why optimistic locking (not pessimistic)?
  - SELECT FOR UPDATE (pessimistic) locks the row → serializes ALL checkouts for that product
  - At 1000 concurrent checkouts for a flash sale item, this creates a bottleneck
  - Optimistic: most checkouts succeed on first try (no contention)
  - Only retries on actual conflict (~1-5% during normal traffic)

Flash sale alternative: Pre-allocate inventory tokens
  1. Before sale: Create 1000 tokens in Redis (SET inventory:ITEM:tokens → 1000)
  2. At checkout: DECR inventory:ITEM:tokens → if >= 0, proceed; else, sold out
  3. Much faster than DB update (Redis DECR is atomic, sub-ms)
  4. Reconcile with DB async

9.2 Cart Design

text
Cart stored in Redis (not DB):
  Key: cart:{user_id}
  Value: Hash {
    "P123": {"quantity": 2, "price": 49.99, "added_at": epoch},
    "P456": {"quantity": 1, "price": 299.99, "added_at": epoch}
  }
  TTL: 30 days (abandoned cart cleanup)

Why Redis for cart?
  - Fast reads/writes (cart is accessed frequently)
  - TTL for auto-cleanup
  - No need for ACID (cart is soft state)
  - If Redis dies, user re-adds items (acceptable)

Why NOT a database?
  - 50M users * 3 cart ops/day = 150M DB writes/day for carts alone
  - Carts are ephemeral — most are abandoned
  - DB cost for temporary data is wasteful

9.3 Search Architecture

text
Elasticsearch for product search:
  500M products indexed
  Faceted filtering: brand, category, price range, rating, seller
  Full-text: product name + description
  Autocomplete: edge-n-gram tokenizer

Query example: "laptop under $1000 with SSD"
  → ES bool query:
    must: match("laptop SSD")
    filter: range(price, lte: 1000)
    sort: [_score, rating_avg DESC]

Ranking: BM25 text relevance * popularity boost * recency boost

10. Bottlenecks and Trade-offs

DecisionTrade-off
Saga vs 2PCSaga: eventual consistency, compensable. 2PC: strong consistency, slower, coordinator SPOF
Optimistic vs pessimistic lockingOptimistic: better throughput, retries on conflict. Pessimistic: serializes, guaranteed
Cart in Redis vs DBRedis: fast but volatile. DB: durable but slower for ephemeral data
Microservices vs monolithMicro: independent scaling, team autonomy. Mono: simpler transactions, less latency
Pre-computed prices vs dynamicPre: fast reads, stale risk. Dynamic: always correct, slower