Choosing the right database is one of the most consequential architectural decisions in system design. This guide provides a comprehensive comparison of seven database categories -- SQL (relational), Document, Graph, Vector, Time-Series, Key-Value, and Wide-Column -- across critical dimensions including consistency guarantees, scalability characteristics, query complexity support, latency profiles, data model flexibility, operational burden, and cost. Use the decision flowchart and comparison tables below to quickly narrow candidates based on your workload requirements, then dive into the individual guides for deeper analysis.
The Seven Database Categories
Category
Representative Databases
Core Data Model
Primary Strength
SQL (Relational)
PostgreSQL, MySQL, CockroachDB, Aurora
Tables with rows and columns, enforced schema
ACID transactions, complex queries, data integrity
Document
MongoDB, CouchDB, Firestore, DynamoDB
JSON/BSON documents in collections
Schema flexibility, developer productivity
Graph
Neo4j, Neptune, TigerGraph, ArangoDB
Nodes and edges with properties
Relationship traversals, pattern matching
Vector
Pinecone, Milvus, Weaviate, pgvector
High-dimensional embedding vectors
Similarity search, AI/ML applications
Time-Series
InfluxDB, TimescaleDB, QuestDB, Prometheus
Timestamped data points in series
Time-ordered ingestion, downsampling, retention
Key-Value
Redis, DynamoDB, etcd, Memcached
Simple key-to-value mappings
Sub-millisecond reads, caching, sessions
Wide-Column
Cassandra, ScyllaDB, HBase, Bigtable
Column families with flexible columns per row
Massive write throughput, linear scalability
Comprehensive Comparison Table
Consistency and Availability
Dimension
SQL
Document
Graph
Vector
Time-Series
Key-Value
Wide-Column
Default consistency
Strong (ACID)
Eventual / Tunable
Strong (single-node)
Eventual
Strong (single-node)
Strong or Eventual
Tunable
Transaction support
Full ACID, multi-row
Single-doc atomic; limited multi-doc
Varies by engine
None typically
Limited
Single-key atomic
Lightweight (LWT)
CAP classification
CP (most)
AP or CP (varies)
CP (most)
AP
CP (most)
CP or AP (varies)
AP (Cassandra) or CP (HBase)
Isolation levels
Serializable, SI, RC, RU
Read concern levels
Varies
N/A
Varies
Single-op atomic
Per-query tunable
Scalability
Dimension
SQL
Document
Graph
Vector
Time-Series
Key-Value
Wide-Column
Horizontal scaling
Hard (sharding complex)
Native (auto-sharding)
Limited
Native
Moderate
Native
Native (linear)
Max practical scale
~10 TB (single), ~PB (NewSQL)
PB scale
~100s GB typical
TB scale
PB scale (with tiering)
TB scale (in-memory)
PB scale
Write scalability
Moderate
High
Moderate
High (batch)
Very high
Very high
Very high
Read scalability
High (with replicas)
High
High (local queries)
High
High
Very high
High
Geo-distribution
Complex (CockroachDB, Spanner)
Built-in (Atlas, Cosmos)
Limited
Managed options
Limited
Redis Enterprise
Built-in (Cassandra)
Query Capabilities
Dimension
SQL
Document
Graph
Vector
Time-Series
Key-Value
Wide-Column
Query language
SQL (standard)
MongoDB Query, SQL-like
Cypher, Gremlin, SPARQL
Vector APIs, filters
SQL-like (InfluxQL, Flux)
GET/SET/SCAN
CQL
Join support
Full (inner, outer, cross)
$lookup (limited)
Traversals (native)
None
Limited
None
None
Aggregation
Full (GROUP BY, HAVING, window)
Aggregation pipeline
Limited
None
Built-in (downsampling)
None
Limited
Ad-hoc queries
Excellent
Good
Good (for graph patterns)
Limited (similarity only)
Good (time-based)
Poor (key-only)
Poor (partition-key required)
Secondary indexes
Unlimited
Supported
Property indexes
Metadata filtering
Tag-based
Limited
Limited (use with caution)
Full-text search
Basic (or with extensions)
Atlas Search / built-in
Limited
Hybrid (some engines)
No
No
No
Latency and Performance
Dimension
SQL
Document
Graph
Vector
Time-Series
Key-Value
Wide-Column
Read latency (p50)
1-10 ms
1-10 ms
1-50 ms
10-100 ms
1-10 ms
< 1 ms
1-10 ms
Write latency (p50)
1-10 ms
1-10 ms
5-50 ms
10-100 ms
< 1 ms
< 1 ms
1-5 ms
Bulk ingestion
Moderate
High
Slow
High (batch)
Very high
High
Very high
Hot-path optimization
B-tree indexes
Document indexes
Graph traversal cache
HNSW/IVF indexes
Time-partitioned storage
In-memory hash
LSM-tree + Bloom filters
Operational Characteristics
Dimension
SQL
Document
Graph
Vector
Time-Series
Key-Value
Wide-Column
Schema management
Strict (migrations required)
Flexible (schema-on-read)
Flexible
Dimension-fixed
Semi-structured
Schemaless
Semi-structured (CQL)
Learning curve
Low (SQL is universal)
Low-moderate
Moderate-high
Moderate
Low-moderate
Very low
Moderate-high
Ops complexity
Low-moderate
Low (managed) to moderate
Moderate-high
Low (managed)
Low-moderate
Low
High
Backup/restore
Mature tooling
Built-in (managed)
Varies
Managed
Built-in
RDB/AOF snapshots
Snapshot + repair
Monitoring
Excellent ecosystem
Good
Limited
Basic
Good
Good
Good (but complex)
Cost Considerations
Dimension
SQL
Document
Graph
Vector
Time-Series
Key-Value
Wide-Column
Infrastructure cost
Moderate
Moderate
High (memory-intensive)
High (GPU/memory)
Low-moderate
High (RAM-bound)
Moderate (disk-based)
Managed service cost
$$
$$
$$$
$$$
$$
$ - $$
$$
Developer productivity
High (familiar SQL)
Very high (JSON-native)
Moderate
Moderate
High
Very high
Moderate
Vendor lock-in risk
Low (SQL standard)
Moderate
High
High
Moderate
Low
Low-moderate
Decision Flowchart
Use this flowchart to narrow your database selection based on primary workload characteristics:
Secondary Decision: Scale and Consistency Requirements
After identifying the database category from the primary flowchart, use this matrix to select the specific technology:
Use Case Quick Reference
By Industry / Domain
Use Case
Primary DB
Secondary DB
Why
E-commerce (orders, inventory)
PostgreSQL
Redis (cache)
ACID for orders, fast cache for catalog
Social network (feeds, connections)
Neo4j
Cassandra (feed storage)
Graph for connections, wide-column for feeds
IoT platform (sensor data)
Cassandra / ScyllaDB
TimescaleDB (analytics)
Massive writes with wide-column, analytics with TSDB
Content management system
MongoDB
Elasticsearch (search)
Flexible documents, full-text search
Real-time analytics dashboard
InfluxDB / TimescaleDB
Redis (cache)
Time-series ingestion, fast dashboard reads
AI/ML recommendation engine
Pinecone / Milvus
PostgreSQL (metadata)
Vector similarity search, relational metadata
Chat / messaging platform
ScyllaDB
Redis (presence)
Write-heavy message storage, in-memory presence
Financial trading platform
CockroachDB
Redis (order book)
Strong consistency, low-latency lookups
User session management
Redis
DynamoDB (persistence)
Sub-ms reads, durable backup
Fraud detection
Neo4j
PostgreSQL (case mgmt)
Graph pattern matching, relational case data
Log aggregation
Cassandra
Elasticsearch (search)
High write throughput, full-text search
Product catalog
MongoDB
Redis (cache)
Flexible schema per category, fast reads
By Access Pattern
Access Pattern
Best Fit
Rationale
Point lookups by key
Key-Value (Redis, DynamoDB)
O(1) hash-based retrieval
Range scans over time
Time-Series or Wide-Column
Contiguous disk reads, time-partitioned
Complex joins across entities
SQL (PostgreSQL)
Native JOIN optimization, query planner
Traversal of relationships
Graph (Neo4j)
Constant-time per-hop traversal
Nearest-neighbor search
Vector (Pinecone, Milvus)
ANN indexes (HNSW, IVF)
Flexible document retrieval
Document (MongoDB)
Rich query on nested structures
High-throughput writes
Wide-Column (Cassandra)
LSM-tree, append-only, linear scale
Aggregation over time windows
Time-Series (TimescaleDB)
Continuous aggregates, downsampling
Full-text search
Elasticsearch (not in matrix)
Inverted indexes, BM25 ranking
Multi-model (several patterns)
PostgreSQL + extensions
pgvector, TimescaleDB, PostGIS, JSON
Multi-Database Architecture Patterns
Most production systems use polyglot persistence -- multiple databases, each handling what it does best.
Scoring Matrix: Rate Your Workload
Use this scoring approach to systematically evaluate database options. Rate each dimension 1-5 based on your requirements, then match to the database category with the highest alignment.
Requirement Scoring Template
Dimension
Weight (1-5)
Your Score
SQL
Doc
Graph
Vector
TSDB
KV
Wide-Col
Strong consistency
?
?
5
3
4
2
4
3
3
Horizontal scalability
?
?
2
4
2
4
3
5
5
Query complexity
?
?
5
3
4
1
3
1
2
Read latency
?
?
4
4
3
3
4
5
4
Write throughput
?
?
3
4
2
3
5
5
5
Schema flexibility
?
?
1
5
4
2
3
5
3
Operational simplicity
?
?
4
4
2
3
4
4
2
Low cost at scale
?
?
3
3
2
2
4
3
4
Relationship queries
?
?
3
2
5
1
1
1
1
Similarity search
?
?
2
1
1
5
1
1
1
How to use: Multiply your weight by the database score for each dimension, then sum across all dimensions. The highest total indicates the best fit.
Interview-Ready Talking Points
"How do you choose a database for a new system?"
"I start with the access patterns -- what queries will the application run most frequently? Then I evaluate along four axes: consistency requirements, scalability needs, query complexity, and operational budget. For example, if we need ACID transactions with complex joins, SQL is the default. If we need massive write throughput with known query patterns, a wide-column store like Cassandra is appropriate. I also consider polyglot persistence -- using multiple databases where each handles what it does best, connected through CDC or event streaming."
"When would you use multiple databases?"
"Almost every production system at scale uses polyglot persistence. A typical e-commerce platform might use PostgreSQL for orders and inventory (ACID), Redis for session management and caching (sub-ms reads), Elasticsearch for product search (full-text), and Cassandra for event logging (write throughput). The key is to have clear data ownership boundaries and use CDC or event-driven architecture to synchronize when needed."
"What is the biggest mistake teams make when choosing a database?"
"Choosing a database based on hype rather than access patterns. I have seen teams adopt MongoDB because 'it scales' when their workload was 90% complex joins that would have been trivial in PostgreSQL. Another common mistake is premature optimization -- reaching for distributed databases when a single PostgreSQL instance with read replicas would handle the load for years. Start simple, measure, and scale when you hit real bottlenecks."
"How do you handle the transition when you outgrow your database?"
"First, I distinguish between vertical scaling headroom (bigger machine, more RAM, better SSDs) and actual architectural limits. PostgreSQL on a modern server can handle more than most people think. When you truly need horizontal scaling, the strategy depends on the workload: read-heavy systems benefit from replicas, write-heavy systems may need sharding or migration to a distributed database. The key is designing your application with repository patterns and clean data access layers so the database can be swapped without rewriting business logic."
Common Pitfalls by Database Type
Database Type
Common Pitfall
Consequence
Prevention
SQL
No indexes on frequently queried columns
Full table scans, slow queries
Analyze query plans with EXPLAIN
Document
Deeply nested documents with unbounded arrays
Large document sizes, slow updates
Cap nesting depth, reference large sub-collections
Graph
Using for tabular/aggregation workloads
Poor performance, over-engineering
Reserve for genuine relationship traversals
Vector
Wrong distance metric or embedding model
Poor recall, irrelevant results
Benchmark with your actual data
Time-Series
No retention policy or downsampling
Storage costs explode
Define retention + continuous aggregates
Key-Value
Using as primary database instead of cache
Data modeling nightmare
Use as cache layer in front of primary DB
Wide-Column
Poor partition key design (hot partitions)
Single-node bottleneck
Analyze cardinality, use composite keys
Technology Radar: Maturity and Momentum
Database
Maturity
Community
Managed Options
Trend
PostgreSQL
Very High
Massive
RDS, Aurora, Supabase, Neon
Rising (becoming "default database")
MySQL
Very High
Large
RDS, Aurora, PlanetScale
Stable
MongoDB
High
Large
Atlas
Stable
Redis
High
Large
ElastiCache, Redis Cloud
Stable
Cassandra
High
Medium
Astra, Instaclustr
Stable
Neo4j
Medium
Medium
Aura
Growing
InfluxDB
Medium
Medium
InfluxDB Cloud
Stable
TimescaleDB
Medium
Growing
Timescale Cloud
Growing
ScyllaDB
Medium
Growing
ScyllaDB Cloud
Growing
Pinecone
Low-Medium
Growing
Fully managed
Growing fast
Milvus
Low-Medium
Growing
Zilliz
Growing fast
CockroachDB
Medium
Medium
Cockroach Cloud
Growing
Quick Decision Cheat Sheet
text
Need ACID + joins? --> PostgreSQL
Need flexible JSON documents? --> MongoDB
Need relationship traversals? --> Neo4j
Need similarity search on embeddings? --> Pinecone / Milvus
Need time-series with downsampling? --> TimescaleDB / InfluxDB
Need sub-ms key-value lookups? --> Redis
Need massive write throughput? --> Cassandra / ScyllaDB
Need it all on one engine? --> PostgreSQL + extensions
Not sure? --> Start with PostgreSQL