Key Concepts

What is Distributed SQL?

Introducing a new category of cloud native databases powering modern, mission-critical transactional applications

Distributed SQL is a single logical relational database deployed on a cluster of network servers. It  delivers core features traditionally found in either relational (SQL) or non-relational (NoSQL) databases.

The database automatically replicates and distributes data among the servers, often called nodes. It is  also strongly consistent and natively provides ACID transactional support across availability and geographic zones in the cloud.

At a minimum, a distributed SQL database has the following characteristics:

  • A robust SQL API for accessing and manipulating data and objects
  • Automatic data distribution across nodes in a cluster
  • Automatic data replication in a strongly consistent manner
  • Distributed query execution so clients need not know about the underlying distribution of data
  • Support for distributed ACID transactions

The Evolution of Distributed SQL Databases

For decades the relational database has by default served as the system of record for mission-critical applications. But non-distributed, monolithic SQL databases like  Oracle, Microsoft SQL Server, EDB PostgreSQL, and IBM DB2, run on a single server. This means that they can only scale up to the maximum amount of CPU, memory, disk, and network bandwidth that the one server can offer. 

With an increasing percentage of transactions coming from an exponentially growing set of cloud-based, connected applications, workloads increase dramatically.

As a result, stand-alone, traditional SQL database quickly run out of resources. At this point, the best-case scenario is that latency skyrockets and these new application connections and transactions wait in a long queue. The worst-case is that the database crashes, bringing business transactions to a halt.

Monolithic databases constitute a significant risk for cloud native applications which require an always-available database that can scale as transactions grow or contract as demand shrinks.

Unlike a monolithic SQL database, a distributed SQL database:

  • Is resilient to failures, protecting critical data and applications
  • Scales horizontally to easily support workload increases and decreases, and support business growth
  • Supports a geo-distributed cluster topology spanning multiple regions and cloud providers to deliver an always-on, consistent experience to users anywhere in the world
  • Provides a high level of SQL compatibility, with standard relational database management features and functionality
  • Matches your database architecture with container and Kubernetes developer environments that improve business agility
  • Improves data visibility and real-time data analysis to reduce security risk

Distributed SQL Database Architecture

A distributed SQL database architecture is generally divided into two primary layers: the query layer and the storage layer.
Distributed SQL Database Architecture

1. API/Query Processing Layer

The API/query processing layer is responsible for language-specific query/command compilation, execution, and optimization. Applications interact directly with the query layer using client drivers.

There is an SQL API in this layer for applications to model relational data and to perform queries involving those relations. Those queries are then automatically distributed across multiple nodes of the database cluster by the distributed data storage layer.

2. Distributed Data Storage Layer

Data (including indexes) in a distributed SQL database are automatically distributed—or sharded—across multiple nodes of the cluster so that no single node becomes a bottleneck, ensuring high performance and availability.

The underlying data storage layer supports the robust SQL API layer by automatically replicating data synchronously—across multiple nodes—to guarantee availability during failures, ensure transactional integrity, and maintain data consistency. No operator intervention is needed; it is done using the Raft distributed consensus protocol.

Additionally, the database storage layer supports distributed ACID transactions, modifying  multiple rows distributed across shards on multiple nodes for absolute data integrity and safety.

Learn more about the two-layer architecture of a distributed SQL database

Why Distributed SQL is the Ideal Modern Database

The need to innovate and grow is forcing companies to deliver born-in-the-cloud applications (in the form of microservices) and embrace edge computing frameworks and streaming workloads. The requirement to support diverse workloads, along with the drive to lower IT costs and reduce risk through better compliance, puts pressure on traditional, monolithic systems of record databases.

To fully benefit from the advantages of modern, cloud native applications, frameworks, and workloads, your IT stack must incorporate a new class of database. This should bring a core set of capabilities that align to your application and infrastructure transformation initiatives:

  • Resilient to failures and continuously available. Fast failover capabilities mean that critical services remain available during node, zone, region, and data center failures, as well as during regular system maintenance.
  • Horizontally scalable. To handle more transactions per second or a higher number of concurrent client connections or larger datasets, a distributed SQL database can be effortlessly scaled out—even under heavy load—without downtime simply by adding nodes to a running cluster. It can be scaled back just as easily when the load decreases.
  • Geographically distributed. To distribute data globally and control precisely where the data lives, distributed SQL uses synchronous and asynchronous data replication and geo-partitioning to deploy the database in various geo-distributed configurations. Distributed SQL databases support the building of geo-distributed applications, helping to overcome region failures automatically and lower latency for end users by bringing data closer to their local region.
  • SQL and RDBMS feature compatibility. Your developers no longer need to choose between the horizontal scalability of cloud native systems and the ACID guarantees and strong consistency of a traditional RDBMS. A distributed SQL database supports ACID transactions without trading the resiliency (to failures) and scalability that mission-critical, cloud native systems demand.
  • ACID transaction support. Distributed SQL databases are designed for systems of record. They supply transactional integrity and strong consistency from the ground up with coordinated writes, locked records, and other methods including multi-version concurrency control.
  • Hybrid and multi-cloud ready. Organizations can deploy and run their data infrastructure anywhere to reduce costs, avoid vendor lock-in, and simplify processes.

The Benefits of Distributed SQL Databases 

Distributed SQL combines the best features of traditional RDBMSs with the innovation of cloud native databases to help you:

  • Accelerate developer productivity by building new features and services quickly
  • Unlock the potential of your apps by scaling with them up, or down, as needed
  • Improve your customer experience by exceeding uptime SLAs
  • Reduce operational costs with lower TCO by only paying for what you need
  • Protect your critical data in production environments with built-in security controls
  • Seamlessly upgrade from your legacy database with advanced migration tools

How to Get Started with Distributed SQL

Whether you’re a cloud native application developer, architect, or database operator, you would benefit from a database that can handle your most demanding workloads. Distributed SQL combines enterprise-grade, relational database capabilities with the horizontal scalability and resilience of cloud native architectures.

YugabyteDB – The Only Distributed SQL Database You Need

YugabyteDB is the cloud native distributed SQL database for mission-critical applications. Companies using YugabyteDB for their scale-out RDBMS and internet-scale OLTP workloads benefit from a distributed database that is: 

1. PostgreSQL compatible. YugabyteDB reimagines PostgreSQL for a cloud native world. It reuses the PostgreSQL query layer and supports all advanced features, including triggers, stored procedures, user-defined functions, and expression indexes. This familiarity and compatibility reduces developer onboarding time and costs, allowing them to get productive quicker.

2. Horizontally scalable. YugabyteDB is proven in production environments to scale beyond 3000K TPS, over 100 TB of data, and thousands of concurrent connections, allowing you to scale out and in with zero impact.

3. Resilient to failures. Ensure continuous availability during infrastructure failures and maintenance tasks.

4. Geo-distributed. Use replication and data geo-partitioning capabilities to achieve the latency, resiliency, and compliance your applications need.

5. Secure. With key security features built into the data layer, including authentication, authorization, and encryption, you can ensure you build a safe business.

6. Enables hybrid and multi-cloud deployments. The freedom to deploy in public, private, and hybrid cloud environments on VMs, containers, or bare metal.

Discover about this powerful, open-source distributed SQL database today via our free, self-paced course: Introduction to YugabyteDB  Or join the YugabyteDB Community Slack channel and chat live with experts and other database users to get answers to all your YugabyteDB and distributed SQL questions.

Additional Distributed SQL Resources: