Key Concepts

What is Distributed SQL?

A revolutionary category of databases for building mission-critical, cloud native applications

Distributed SQL is a single logical relational database deployed on a cluster of servers. A distributed SQL database automatically replicates and distributes data across multiple servers. These databases are strongly consistent and support consistency across availability and geographic zones in the cloud.

At a minimum, a distributed SQL database has the following characteristics:

  • A SQL API for accessing and manipulating data and objects
  • Automatic distribution of data across nodes in a cluster
  • Automatic replication of data in a strongly consistent manner
  • Support for distributed query execution so clients do not need to know about the underlying distribution of data
  • Support for distributed ACID transactions

Why Distributed SQL?

Business innovation is putting pressure on traditional systems of record. This is forcing companies to deliver high-value applications and services more quickly while lowering IT costs and reducing risk through compliance.

But these applications—in the form of microservices, born-in-the-cloud applications, and edge and IoT workloads—require a new class of database that is:

  • Resilient to failures and continuously available — Critical services remain available during node, zone, region, and data center failures as well as system maintenance with fast failover.
  • Horizontally scalable — Operations teams can effortlessly scale out even under heavy load without downtime by simply adding nodes to a cluster, and scale back in when the load reduces.
  • Geographically distributed — Operators can make use of synchronous and asynchronous data replication and geo-partitioning to deploy databases in geo-distributed configurations.
  • SQL and RDBMS feature compatible — Developers no longer need to choose between the horizontal scalability of cloud native systems and the ACID guarantees and strong consistency of traditional RDBMSs.
  • Hybrid and Multi-cloud ready — Organizations can deploy and run data infrastructure anywhere—and avoid locking into any specific cloud provider.

Distributed SQL Database Architecture

A distributed SQL database provides the best of traditional RDBMSs with cloud native databases. It has a two-layer architecture as part of a single logical SQL database:

Distributed SQL database architecture.

1. Query Processing Layer
A distributed SQL database has a SQL API for applications to model relational data and also perform queries involving those relations. Queries are automatically distributed across multiple nodes of the database cluster.

2. Distributed Data Storage Layer
Data including indexes in a distributed SQL database are automatically distributed—or sharded—across multiple nodes of the cluster so that no single node becomes a bottleneck to high performance and availability.

Supporting a powerful SQL API layer requires building the underlying storage layer on strongly consistent replication across all nodes of the cluster. This means writes to the database synchronously commit at multiple nodes in order to guarantee availability during failures.

And finally, the database storage layer supports distributed ACID transactions where multiple rows located on multiple nodes requires transaction coordination.

Getting Started with Distributed SQL

Whether you’re a cloud native application developer, architect, or database operator, you need a distributed SQL database. These databases combines enterprise-grade, relational database capabilities with the horizontal scalability and resilience of cloud native architectures.

YugabyteDB reimagines PostgreSQL for a cloud native world. Learn more about this powerful, open source database today through our free, self-paced courses: Introduction to Distributed SQL and Introduction to YugabyteDB.

And finally, join our YugabyteDB Community Slack channel to answer any and all of your questions.