ACID transactions were a big deal when first introduced formally in the 1980s in monolithic SQL databases such as Oracle and IBM DB2. Popular distributed NoSQL databases of the past decade including Amazon DynamoDB and Apache Cassandra initially focused on “big data” use cases that did not require such guarantees and hence avoided implementing them altogether. However, ACID transactions have made a strong comeback in the last several years with the launch of next-generation distributed databases that have built-in support for them.
This post serves as a primer on ACID transactions for app developers building distributed apps in the cloud. It highlights:
Why ACID transactions remain a fundamental need for cloud apps?
What’s necessary to implement them in databases?
Why monolithic databases have them but 1st generation distributed databases don’t?
Defining a Transaction
A transaction symbolizes a unit of work performed within a database. It is often composed of multiple operations.
The example below shows a transaction at a bank with 4 operations that transfer $100 from Alice’s account at one branch to Bob’s account at another branch.
Get To Know PostgreSQL Defining ACID
ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties of database transactions intended to guarantee validity even in the event of system crashes, power failures, and other errors.
Guarantees that all operations in a transaction are treated as a single “unit”, which either succeeds completely or fails completely.
Ensures that a transaction can only bring the database from one valid state to another by preventing data corruption.
Determines how and when changes made by one transaction become visible to the other.
Serializable and Snapshot Isolation are the top 2 isolation levels from a strictness standpoint.
Ensures that the results of the transaction are permanently stored in the system. The modifications must persist even in case of power loss or system failures.
ACID’s Consistency vs. CAP’s Consistency
CAP Theorem, first published in 2000, made it easier for engineers to reason about distributed systems.
Distributed systems must choose between
Consistency and 100% Availability in the presence of network Partitions.
Unfortunately, the use of “Consistency” in both ACID and CAP led to confused developers. Consistency in CAP is a more fundamental concept — it refers to the guarantee that all members of a distributed system have a shared understanding of
the value of a single data element from a read standpoint. This guarantee is also referred to as strong consistency or linearizability. In fact, CAP’s consistency is better compared against ACID’s isolation levels since both deal with the values that read operations are allowed to see.
On the other hand, ACID’s consistency refers to data integrity guarantees that ensure the transition of
the entire database from one valid state to another. Such a transition involves strict enforcement of integrity constraints such as data type adherence, null checks, relationships, and more. Given that a single ACID transaction can touch multiple data elements where as CAP’s consistency refers to a single data element, ACID transactions are a stronger guarantee than CAP’s consistency. The Benefits of ACID Transactions
1. Absolute Data Integrity and Safety
Avoiding lost updates, dirty reads, stale reads, and enforcing app-specific integrity constraints are critical concerns for app developers. This is especially true when building user-facing applications in verticals such as financial services, retail, and SaaS. Solving these concerns directly at the database layer using the consistency provided by ACID transactions is a much simpler approach. 2. Simplified Concurrency Control
Concurrent access to shared resources such as retail inventory, bank balance, and gaming leaderboards is unavoidable. Isolation in ACID transactions come to the rescue of app developers. E.g. when a database guarantees transactions with serializable isolation, developers can treat each transaction as if it were executed sequentially, even though it may actually be executed concurrently. Onerous reasoning about potential conflicts between operations from separate transactions is obviated altogether.
3. Intuitive Data Access Logic
ACID compliant databases usually allow complex schema modeling and native support for multi-step data manipulation operations such as consistent secondary indexes. Business logic can be now represented more directly in the application code.
4. Future-Proofing Database Needs
Durability is rarely up for debate in databases where stable persistence is a must-have. Hence our view is that “in-memory” only systems should not be even considered databases! However, there is always an urge on part of developers to trade-off either Atomicity or Consistency or Isolation or some combination of them in return for higher performance in distributed databases. While these tradeoffs are sometimes easier to justify in the short run, the loss of flexibility in the long run comes with a heavy cost. Competitive advantage in business comes from the ability to enhance apps fast. E.g. an internal, dashboards-only, non-transactional app can be transformed into a customer-facing transactional app in minimal time if and only if the original database was future-proof for such a change.
What’s Needed For Implementing ACID?
For any monolithic or distributed database to implement ACID transactions, there are four foundational aspects that need to be designed and developed.
Provisional Updates (Atomicity)
Transactions involve multiple operations across multiple rows. Given the need to treat all these operations as a single unit, some form of provisional update in a temporary space is needed first followed by a commit.
Strongly Consistent Core (Consistency)
A strongly consistent core is the basis for achieving ACID guarantees in a transaction involving only a single operation on a single row. The additional data integrity constraints needed to achieve full ACID (with multiple operations across multiple rows) are built on top of this core.
Transaction Ordering (Isolation)
For a database to support the strictest
serializable isolation level, a mechanism such as globally ordered timestamps is needed to sequentially arrange all the transactions. On the other hand, the snapshot isolation level relies on a partial ordering where sub-operations in one transaction may interleave with those from other transactions. The benefit is lower latency and higher throughout than serializable level while continuing to detect write-write conflicts. Persistent Storage (Durability)
Compared to the other three properties, durability is the easiest to achieve by simply using a storage engine that can store data in an underlying persistent device such as HDDs and SSDs. Our post
A Busy Developer’s Guide to Database Storage Engines explains how various types of storage engines work and the specific workload patterns they are usually optimized for. Why ACID Became Optional?
ACID compliance was taken for granted in monolithic databases of the past. This is because the monolithic database server can make provisional updates, is strongly consistent by default, runs on persistent disks, and most importantly can act as a single source of truth for the ordering of concurrent transactions.
As databases became distributed and NoSQL starting late 2000s, the first order of business was to decide on the adherence to the CAP Theorem. The default choice was Availability over Consistency given the focus on big data workloads that don’t require absolute correctness. Net result was that the foundation necessary for the Consistency in ACID was compromised. It became easier to give up on Atomicity and Isolation thereafter. Each node came with its own database server that had control over its own subset of data in the overall cluster. Making provisional updates across multiple such nodes and doing so with some ordering was deemed complex and unnecessary. At the same time, transactional data volumes were low enough and could be satisfied by vertical scaling of monolithic ACID compliant databases. The lack of ACID in distributed databases did not hurt traditional enterprises — until recently.
ACID transactions are a fundamental feature of operational databases. They help enterprises simultaneously gain customer data integrity and app development agility. Implementing ACID transactions in a database requires significant systems engineering effort, especially when the database is distributed across multiple nodes. In this
follow up post on distributed ACID transactions with high performance, we dive deeper into the challenges involved and see how next-generation distributed databases such as YugabyteDB are solving those challenges. Meanwhile, you can see YugabyteDB’s distributed transactions in action using a local cluster.
ACID Transactions, NoSQL, SQL
ACID Transactions, Databases, Distributed SQL, How It Works