docs/Distributed System With Big Data/range-partitioning

Range Partitioning

Range partitioning divides data into partitions based on contiguous ranges of the partition key, so that each partition owns all records whose key falls within a defined boundary -- enabling efficient range scans, time-based queries, and ordered data access, while requiring careful boundary selection to avoid skew and hot partitions.

How Range Partitioning Works
Boundary Selection
Range Assignment Diagrams
Advantages for Time-Series and Range Queries
Hot Spot Risks with Monotonically Increasing Keys
Real Systems Using Range Partitioning
Pros and Cons
Common Mistakes
Interview Framing
Top 5 Use Cases
Top 5 Warning Signs
Tradeoff Table
Revision Summary
Similar Concepts to Review Next

How Range Partitioning Works

Each partition is assigned a contiguous range of the key space defined by a lower bound (inclusive) and an upper bound (exclusive). Given a key, the system finds the partition whose range contains that key.

text

Partition 0: [A, F)
Partition 1: [F, K)
Partition 2: [K, P)
Partition 3: [P, U)
Partition 4: [U, Z]

A record with key "Garcia" falls in Partition 1 because "F" <= "Garcia" < "K".

The Lookup Process

Unlike hash partitioning (which computes a function), range partitioning requires knowing the boundaries. The system maintains a sorted list of split points and performs a binary search to find the correct partition.

Boundary Selection

Choosing partition boundaries is one of the most critical decisions. Poor boundaries lead to skewed partitions.

Static Boundaries

Defined at table creation time and fixed. Simple but inflexible.

sql

-- PostgreSQL range partitioning
CREATE TABLE events (
    event_id    BIGSERIAL,
    event_date  DATE,
    payload     JSONB
) PARTITION BY RANGE (event_date);

CREATE TABLE events_2025_q1 PARTITION OF events
    FOR VALUES FROM ('2025-01-01') TO ('2025-04-01');
CREATE TABLE events_2025_q2 PARTITION OF events
    FOR VALUES FROM ('2025-04-01') TO ('2025-07-01');
CREATE TABLE events_2025_q3 PARTITION OF events
    FOR VALUES FROM ('2025-07-01') TO ('2025-10-01');
CREATE TABLE events_2025_q4 PARTITION OF events
    FOR VALUES FROM ('2025-10-01') TO ('2026-01-01');

Dynamic Boundaries (Split and Merge)

The system automatically adjusts boundaries based on partition size or load.

Boundary Selection Strategies

Strategy	How It Works	Pros	Cons
Uniform intervals	Equal-width ranges (e.g., monthly)	Simple, predictable	Skew if data is not uniformly distributed
Quantile-based	Analyze data distribution, choose boundaries so each partition has equal record counts	Even partition sizes	Requires sampling; boundaries may shift
Size-based auto-split	Split when a partition exceeds a threshold	Adapts to data growth	Split operations cause brief pauses
Load-based auto-split	Split when a partition's QPS exceeds a threshold	Adapts to access patterns	May split cold data unnecessarily
Manual	DBA sets boundaries based on domain knowledge	Full control	Requires ongoing tuning

Range Assignment Diagrams

Number Line Visualization

Time-Based Partitioning Layout

Multi-Level Range Partitioning

Some systems support composite range partitioning (range on one key, then sub-range on another).

Advantages for Time-Series and Range Queries

Range partitioning excels when queries naturally align with the partition boundaries.

Partition Pruning

The query optimizer eliminates entire partitions from the scan based on WHERE clause predicates.

sql

-- Only scans the Q1 2025 partition, skipping all others
SELECT * FROM events
WHERE event_date BETWEEN '2025-02-01' AND '2025-02-28';

TTL and Data Lifecycle

Old partitions can be dropped instantly rather than running expensive DELETE queries.

sql

-- Drop an entire quarter of data in milliseconds
DROP TABLE events_2023_q1;
-- vs. DELETE FROM events WHERE event_date < '2023-04-01';  -- hours of I/O

Efficient Ordered Scans

Within a range partition, data can be stored in sorted order, enabling efficient sequential reads.

Query Pattern	Hash Partitioning	Range Partitioning
`WHERE date = '2025-02-15'`	Scatter to all partitions	Prune to one partition
`WHERE date BETWEEN X AND Y`	Scatter to all partitions	Prune to 1-3 partitions
`ORDER BY date LIMIT 100`	Scatter-gather + merge sort	Read from one partition
`WHERE user_id = 42`	Route to one partition	Scatter to all partitions

Hot Spot Risks with Monotonically Increasing Keys

The most dangerous pitfall of range partitioning: if the partition key is monotonically increasing (timestamps, auto-increment IDs), all new writes go to the last partition.

Mitigation Strategies

Strategy	How It Works	Tradeoff
Prefix with hash	Key = `hash(user_id) + timestamp`	Loses pure time-range pruning
Bucket by time window	Pre-create future partitions (daily/hourly)	Hot partition rotates, but the "current" partition is always hot
Composite key	Partition by `(region, timestamp)`	Spreads writes across regions; requires knowing region at query time
Reverse timestamp	Key = `MAX_TS - timestamp`	Spreads across old partitions; confusing semantics
Random prefix salting	Key = `random(0..9) + timestamp`	Requires 10x fan-out for reads

The Correct Pattern for Time-Series

Real Systems Using Range Partitioning

HBase Regions

HBase stores data sorted by row key.
A region is a contiguous range of row keys.
Regions auto-split when they exceed a configurable size (default 10 GB).
The RegionServer assignment is managed by the HBase Master.
Best practice: Pre-split tables based on expected key distribution to avoid initial hot-spotting.

PostgreSQL Range Partitioning

Native declarative partitioning since PostgreSQL 10.
Supports range, list, and hash partitioning.
Partition pruning is handled by the query optimizer.
No automatic splitting; DBA must create new partitions.

CockroachDB Ranges

Data is divided into 512 MB ranges (default).
Ranges split automatically when they exceed the threshold.
Ranges are replicated via Raft consensus.
The system continuously rebalances ranges across nodes.

Google Bigtable / Cloud Spanner

Bigtable splits tablets by row key range.
Spanner uses range-based splits with automatic resharding.
Both support hierarchical interleaving for parent-child data locality.

Pros and Cons

Pros

Efficient range queries: Scans over contiguous key ranges touch only relevant partitions.
Partition pruning: The optimizer skips irrelevant partitions, reducing I/O dramatically.
Data locality: Related records (same time period, same prefix) are stored together.
Simple lifecycle management: Drop old partitions for TTL; attach new partitions for incoming data.
Ordered access: Within a partition, data maintains sort order.

Cons

Hot partition risk: Monotonically increasing keys concentrate all writes on one partition.
Skew potential: Non-uniform key distributions create unbalanced partitions.
Boundary management: Static boundaries require manual maintenance; dynamic splits add complexity.
Point lookups may scatter: If the query key does not align with the partition key, all partitions must be checked.
Split overhead: Auto-splitting a hot partition causes a brief pause in writes.

Common Mistakes

Partitioning by auto-increment primary key: Creates a permanent hot spot on the last partition.
Too few partitions for time-series data: Monthly partitions with high write throughput overload a single partition for 30 days.
Not pre-creating future partitions: Inserts into a non-existent partition fail. Automate partition creation with cron jobs or pg_partman.
Choosing boundaries based on current data: Data distributions change over time. What is balanced today may be skewed in six months.
Ignoring partition pruning in query design: If queries do not include the partition key in the WHERE clause, the optimizer cannot prune.
Mixing hot and cold data in the same partition: Place recent (hot) data on fast storage and old (cold) data on cheap storage by using separate tablespaces.

Interview Framing

When an interviewer asks: "How would you store and query time-series sensor data at scale?"

Identify the access pattern: "Queries are almost always time-bounded: 'give me sensor readings from the last hour/day/week.'"
Choose range partitioning: "I would range-partition by timestamp, creating daily partitions."
Address the hot partition: "Today's partition receives all current writes. To spread the load, I would use a composite key of (sensor_id, timestamp) so writes for the same time window are spread across sensor buckets."
Discuss lifecycle: "Partitions older than 90 days are moved to cold storage (S3), and partitions older than 1 year are dropped."
Mention partition pruning: "Queries like 'last 24 hours of sensor X' only scan 1-2 partitions instead of the full dataset."

Top 5 Use Cases

Time-series databases: IoT, monitoring, logging -- partition by day/hour for efficient time-range queries and TTL.
Financial transaction history: Partition by transaction date for regulatory reporting and archival.
Data warehouses: Fact tables partitioned by date for partition pruning in analytical queries (BigQuery, Redshift).
Geographic data: Partition by region code for data locality and compliance.
Versioned data stores: Partition by version range for efficient rollback and point-in-time queries.

Top 5 Warning Signs

One partition is receiving all writes: You are using a monotonically increasing partition key.
Full table scans despite having partitions: Queries are not including the partition key in WHERE clauses.
Partition sizes are wildly uneven: Your boundaries do not match the actual data distribution.
Insert failures on new data: You have not created partitions for future date ranges.
DROP TABLE operations are slow: You are deleting rows instead of dropping entire partitions.

Tradeoff Table

Tradeoff	Favoring Narrow Ranges	Favoring Wide Ranges
Partition count	Many small partitions	Few large partitions
Pruning granularity	More precise, less data scanned	Coarser, may scan extra data
Metadata overhead	More partition metadata	Less metadata
Lifecycle management	More partitions to manage (cron, monitoring)	Fewer but larger drops
Write concentration	Writes spread across more partitions	Writes concentrated in fewer partitions
Auto-split frequency	Less frequent (partitions are small)	More frequent (partitions grow faster)

Revision Summary

Range partitioning assigns contiguous key ranges to partitions, enabling efficient range scans and partition pruning.
Boundaries can be static (DBA-defined) or dynamic (auto-split on size/load thresholds).
The biggest risk is hot partitions from monotonically increasing keys; mitigate with composite keys or time-bucketed pre-creation.
Time-series is the canonical use case: partition by day/hour, prune on time predicates, drop old partitions for TTL.
Real systems like HBase, CockroachDB, and Spanner use range partitioning with automatic splitting.

Similar Concepts to Review Next

Hash Partitioning -- The alternative when uniform distribution matters more than range queries
Hot Partitions and Rebalancing -- Deep dive into the hot-spot problem and rebalancing
Partition Key Selection -- How to choose between range, hash, and composite keys
Consistent Hashing -- Dynamic cluster resizing without range boundary management
Partitioning Overview -- Taxonomy and comparison of all strategies

Previous← Partitioning Overview NextTenant Sharding →