
For decades, the database world was split between two camps: **relational databases** (PostgreSQL, MySQL) with strong consistency but limited scalability, and **NoSQL databases** (MongoDB, Cassandra) with horizontal scalability but weaker consistency and limited query capabilities.
For decades, the database world was split between two camps: relational databases (PostgreSQL, MySQL) with strong consistency but limited scalability, and NoSQL databases (MongoDB, Cassandra) with horizontal scalability but weaker consistency and limited query capabilities.
Distributed SQL (also called NewSQL) bridges this gap. It provides the full power of SQL — ACID transactions, joins, foreign keys, and familiar tooling — on a horizontally scalable, distributed architecture. The result: the consistency and query power of a relational database with the scalability of NoSQL.
Strong Consistency
▲
│ ┌───────────────────────┐
│ │ Distributed SQL │
│ │ (CockroachDB, │
│ │ YugabyteDB, │
│ │ Google Spanner) │
│ └───────────────────────┘
│
│ ┌──────────────┐ ┌──────────────────┐
│ │ Traditional │ │ Traditional │
│ │ SQL │ │ NewSQL │
│ │ (PostgreSQL,│ │ (Vitess, TiDB) │
│ │ MySQL) │ │ │
│ └──────────────┘ └──────────────────┘
│
│ ┌──────────────────┐
│ │ NoSQL │
│ │ (Cassandra, │
│ │ DynamoDB, │
│ │ CosmosDB) │
│ └──────────────────┘
└────────────────────────────────────────────────────
Horizontal Scalability ──────►
| Feature | Description |
|---|---|
| SQL interface | Full SQL support: JOINs, subqueries, aggregations, window functions |
| ACID transactions | Serializable or Snapshot Isolation across nodes |
| Horizontal scaling | Add nodes without downtime, data rebalances automatically |
| Auto-sharding | Data automatically distributed across nodes |
| Replication | Synchronous replication for durability (RPO=0) |
| High availability | Automatic failover, no manual intervention |
| Geo-distribution | Data can span multiple regions with latency < 50ms |
┌──────────────────────────────────────────┐
│ Spanner Global Architecture │
├──────────────────────────────────────────┤
│ Zone 1 (us-east1) Zone 2 (europe-west)│
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Leader Node │ │ Follower Node│ │
│ │ ┌────────┐ │ │ │ │
│ │ │ Spanner│ │ │ │ │
│ │ │ Server │ │ │ │ │
│ │ └────────┘ │ │ │ │
│ └──────────────┘ └──────────────┘ │
├──────────────────────────────────────────┤
│ TrueTime API (GPS + Atomic) │
│ Clock synchronization < 7ms │
└──────────────────────────────────────────┘
Key innovation: TrueTime API — GPS and atomic clock synchronization providing bounded clock uncertainty. This enables external consistency (linearizability) across globally distributed nodes.
┌─────────────────────────┐
│ SQL Gateway │
│ (Any node can serve) │
└──────────┬──────────────┘
│
┌──────────────────────┼──────────────────────┐
│ │ │
┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
│ Node 1 │ │ Node 2 │ │ Node 3 │
│ ┌───────┐ │ │ ┌───────┐ │ │ ┌───────┐ │
│ │ Range A│ │ │ │ Range A│ │ │ │ Range A│ │
│ │ (LEADER)│ │ │(FOLLOWER│ │ │(FOLLOWER│ │
│ └───────┘ │ │ └───────┘ │ │ └───────┘ │
│ ┌───────┐ │ │ ┌───────┐ │ │ ┌───────┐ │
│ │ Range B│ │ │ │ Range B│ │ │ │ Range B│ │
│ │(FOLLOWER│ │ │ (LEADER)│ │ │(FOLLOWER│ │
│ └───────┘ │ │ └───────┘ │ │ └───────┘ │
└───────────┘ └───────────┘ └───────────┘
How CockroachDB works:
YugabyteDB is designed for PostgreSQL compatibility at the wire protocol level:
-- This is standard PostgreSQL syntax — it just works
CREATE TABLE orders (
id UUID DEFAULT gen_random_uuid(),
customer_id UUID NOT NULL,
total NUMERIC(10,2) NOT NULL,
status TEXT NOT NULL DEFAULT 'pending',
created_at TIMESTAMPTZ DEFAULT NOW(),
PRIMARY KEY (id)
);
-- Distributed ACID transaction across nodes
BEGIN;
INSERT INTO orders (customer_id, total, status)
VALUES ($1, $2, 'pending');
UPDATE inventory SET quantity = quantity - 1
WHERE product_id = $3 AND quantity > 0;
COMMIT;
-- Full PostgreSQL ecosystem compatibility
-- Supports: pgAdmin, Prisma, DBeaver, all standard drivers
| Feature | CockroachDB | YugabyteDB | TiDB | Google Spanner |
|---|---|---|---|---|
| Compatibility | PostgreSQL (85%) | PostgreSQL (95%+) | MySQL | Proprietary |
| Consistency | Serializable | Serializable | Snapshot isolation | External consistency |
| Replication | Raft | Raft | Raft | Paxos |
| Geo-distribution | Yes | Yes | Yes | Yes |
| Auto-sharding | Yes (by range) | Yes (by hash/range) | Yes (by range) | Yes (by split) |
| Secondary indexes | Global (distributed) | Global + Local | Global + Local | Global |
| Foreign keys | Yes | Yes | Yes | Yes |
| CDC | Yes (Kafka sink) | Yes (Kafka, Elastic) | Yes (TiCDC) | Yes (Pub/Sub) |
| Licensing | BSL (source-available) | Apache 2.0 | Apache 2.0 | Proprietary |
| Cloud managed | CockroachDB Cloud | YugabyteDB Managed | TiDB Cloud | Cloud Spanner |
| Self-host | Yes | Yes | Yes | No |
| Use Case | Why Distributed SQL | Example |
|---|---|---|
| Global SaaS applications | Users worldwide need low latency, strong consistency | Multi-region user accounts |
| Financial systems | ACID transactions across shards, no data loss | Payment processing, ledger |
| E-commerce platforms | Inventory, orders, carts — all need consistency | Black Friday traffic spikes |
| Gaming | Player state, leaderboards, transactions | Cross-region multiplayer |
| IoT time series + metadata | SQL for analytics on distributed data | Fleet telemetry with joins |
| Scenario | Why | Better Alternative |
|---|---|---|
| Simple key-value lookups | Overhead of distributed transactions not justified | Redis, DynamoDB |
| Single-region, small data | Complexity not needed | PostgreSQL |
| Full-text search | Not designed for text indexing | Elasticsearch |
| Graph traversals | JOINs on deep relationships are slow | Neo4j |
| Very high write throughput (>100K writes/sec/node) | Consensus overhead may be too high | Cassandra, ScyllaDB |
-- Data is split by primary key ranges
Table: orders
Range 1: PK '00000000' to '3fffffff' → Node 1 (LEADER) + Node 2, Node 3
Range 2: PK '40000000' to '7fffffff' → Node 2 (LEADER) + Node 1, Node 3
Range 3: PK '80000000' to 'bfffffff' → Node 3 (LEADER) + Node 1, Node 2
Range 4: PK 'c0000000' to 'ffffffff' → Node 1 (LEADER) + Node 2, Node 3
Pros: Efficient range scans, good for sequential keys. Cons: Write hot spots if inserting sequential keys (solved by hash-sharded indexes).
-- Hash-sharded primary key
CREATE TABLE orders (
id UUID DEFAULT gen_random_uuid(),
customer_id UUID,
total NUMERIC,
PRIMARY KEY (id HASH)
);
-- Data distribution by hash
Hash('00000000-...') = 0.23 → Tablet 1 → Node 1
Hash('11111111-...') = 0.67 → Tablet 2 → Node 2
Hash('22222222-...') = 0.12 → Tablet 3 → Node 3
Pros: Even data distribution, no hot spots. Cons: Range scans across the entire table are inefficient.
-- This transaction spans multiple nodes
BEGIN;
UPDATE account_a SET balance = balance - 100 WHERE id = 'A';
UPDATE account_b SET balance = balance + 100 WHERE id = 'B'; -- Different node
COMMIT;
-- Under the hood:
-- 1. Transaction coordinator (any node) acquires locks
-- 2. Prepares both participants
-- 3. Raft commits the transaction record
-- 4. COMMIT acknowledged to client
-- All-or-nothing — guaranteed ACID
Indexes are themselves distributed across nodes:
CREATE INDEX idx_orders_customer ON orders (customer_id);
-- Query that uses the distributed index:
SELECT * FROM orders WHERE customer_id = $1;
-- 1. Routes to any node
-- 2. Node looks up customer_id in distributed index
-- 3. Index returns primary keys → fetch from correct ranges
-- 4. Returns results
Distributed SQL supports schema changes without downtime:
-- Add a column — non-blocking
ALTER TABLE orders ADD COLUMN discount NUMERIC(5,2) DEFAULT 0.00;
-- Create index — non-blocking (background build)
CREATE INDEX CONCURRENTLY idx_orders_date ON orders (created_at);
Stream database changes to downstream systems:
-- CockroachDB CDC
CREATE CHANGEFEED FOR TABLE orders, payments
INTO 'kafka://kafka-cluster:9092'
WITH updated, resolved, min_checkpoint_frequency = '5s';
-- YugabyteDB CDC connector
# Deploy to Kafka Connect
{
"name": "yb-cdc-connector",
"config": {
"connector.class": "io.yugabyte.cdc.YBChangeDataCaptureConnector",
"database.hostname": "yb-1.example.com",
"database.port": "5433",
"table.include.list": "public.orders"
}
}
Single region (3 nodes):
replication:
factor: 3
constraints:
- "+region=us-east-1: 3"
# All data in one region, 3 copies, survives 2 node failures
Multi-region (survive region failure):
replication:
factor: 3
constraints:
- "+region=us-east-1: 1"
- "+region=us-west-1: 1"
- "+region=eu-west-1: 1"
# One copy in each region, survives any single region failure
# Write latency: round-trip to the farthest region
Multi-region with close quorum (for low-latency writes):
replication:
factor: 5
constraints:
- "+region=us-east-1: 3" # 3 copies here
- "+region=us-west-1: 1" # 1 copy for DR
- "+region=eu-west-1: 1" # 1 copy for DR
# Write quorum = 3, achievable within us-east-1
# Write latency = intra-region
| Topology | Write Latency (p50) | Read Latency (p50) |
|---|---|---|
| Single node | 1-5ms | 0.5-2ms |
| 3 nodes, same region | 5-15ms | 1-5ms |
| 3 nodes, 3 regions (US, EU, APAC) | 100-300ms | 5-50ms (closest) |
| 5 nodes, 2 regions (close quorum) | 5-15ms | 1-5ms |
Phase 1: Evaluate
- Identify tables that need horizontal scaling
- Assess which queries cross shards
- Plan sharding key
Phase 2: Migrate schema
- Convert CREATE TABLE statements
- Choose primary key strategy (hash vs. range)
- Handle sequences (use UUIDs instead of SERIAL)
Phase 3: Migrate data
- Use CDC or batch export/import
- Verify consistency
- Set up replication from old DB
Phase 4: Cut over
- Point application to new DB
- Monitor performance
- Keep old DB as rollback target for 1 week
-- PostgreSQL → CockroachDB
-- ❌ Sequences (not distributed)
SERIAL PRIMARY KEY → UUID DEFAULT gen_random_uuid()
-- ❌ Array functions not supported
array_agg(DISTINCT x) → json_agg(DISTINCT x) -- workaround
-- ❌ Long-running queries
-- Solution: add index or query hint
-- ❌ Cross-database queries (CockroachDB)
-- Each cluster = one database, no cross-DB queries
Distributed SQL represents the convergence of two decades of database evolution — combining the reliability and expressiveness of SQL with the elasticity and resilience of distributed systems.
Q: Do you need horizontal write scaling?
├─ Yes
│ Q: Do you need ACID across all data?
│ ├─ Yes → Distributed SQL (CockroachDB, Spanner)
│ └─ No → NoSQL (Cassandra, DynamoDB)
└─ No
Q: Do you need SQL features (joins, aggregations)?
├─ Yes → PostgreSQL
└─ No → Key-value or document store
Distributed SQL is the database architecture for the global, cloud-native era — applications that must work everywhere, with strong consistency, without compromise.
No approved comments are visible yet. New community replies may wait for moderation.