Three NoSQL Challenges That Can Be Solved with Distributed SQL
The following text is an excerpt from the new white paper, Migrating From Monolithic to Cloud Native Operational Databases
In this section, we walk through the three challenges of using current generation NoSQL databases: operational complexity, frustrating application development, and inconsistent customer experiences.
1. NoSQL operational complexity
As noted in the previous section, databases have evolved to become cloud hosted but are far from cloud native. The current operational complexity for organizations makes it difficult for them to exploit the elasticity and geo-redundancy of modern cloud infrastructure to its maximum potential. On top of this, the eventually consistent core of NoSQL adds hidden costs. These include performance-killing background repairs and unpredictable, memory-intensive compaction storms. The fact that most enterprises also run an independent caching layer (such as Redis or Memcache) alongside their persistent database simply makes all operational challenges twice as hard.
In context of the original need to move the data layer through the same phases and at the same velocity as the application layer, the above operational challenges simply make such moves next to impossible. If operations teams force such a move, then it means business loss manifested as both unpredictable downtimes and manual, error-prone war rooms.
2. Frustrating application development
Application developers desire the simplicity of ACID transactions. This is so they can easily reason about the read/write behavior of their database client code. Relational databases support multi-row ACID where multiple related rows update or flow in an all-or-nothing and consistent manner. However, most NoSQL databases do not even support single-row ACID transactions. This is because eventual consistency leads to the “C” getting compromised at the remote replicas.
Many NoSQL databases are starting to realize the advantages of strong consistency. As a result, they allow their eventually consistent systems to tune to quorum-based strong consistency settings. However, it is well proven that this form of tunable consistency is not truly strong for many situations. These include dirty reads after failed writes and unpredictable reads after the last writer wins. Developers using this approach spend even more time testing their applications to guarantee predictable behaviors.
There is also the challenge of keeping the independent in-memory cache layer consistent with the underlying persistent database layer. The application handles cache invalidation and cache population carefully to avoid poor performance. Finally, since most databases are good only for a specific application’s needs, developers hold the burden of evaluating new databases.
3. Inconsistent customer experiences
Error conditions are unavoidable. And this is with all the tuning efforts from developers to build strongly consistent OLTP/HTAP applications on eventually consistent NoSQL databases. During these error conditions, inconsistency reveals itself to end customers.
For example, a retailer’s product catalog removed a few items since they were no-longer-available. However, those deletes were not honored when the data went to the customer. This is because the node did not have the deletes applied yet.
Another example involves ignoring some of the time-series metric data. This data calculates aggregates in alerting for time-series monitoring and Internet of Things (IoT) use cases. It never pays to wake up team members in the early morning hours based on incorrect data. Similarly, if a user’s privacy preferences are not immediately honored, there is a possibility her actions will appear to other users in the same account.
Solving NoSQL challenges with Distributed SQL
A distributed SQL database is a single logical relational database deployed on a cluster of servers. The database automatically replicates and distributes data across multiple servers. These databases are strongly consistent and support consistency across availability and geographic zones in the cloud.
At a minimum, a distributed SQL database has the following characteristics:
- A SQL API for accessing and manipulating data and objects
- Automatic distribution of data across nodes in a cluster
- Automatic replication of data in a strongly consistent manner
- Support for distributed query execution so clients do not need to know about the underlying distribution of data
- Support for distributed ACID transactions
Why Distributed SQL?
Business innovation is putting pressure on traditional systems of record. This is forcing companies to deliver high-value applications and services more quickly while lowering IT costs and reducing risk through compliance.
But these applications—in the form of microservices, born-in-the-cloud applications, and edge and IoT workloads—require a new class of database that is:
- Resilient to failures and continuously available: Critical services remain available during node, zone, region, and data center failures as well as system maintenance with fast failover
- Horizontally scalable: Operations teams can effortlessly scale out even under heavy load without downtime. They can do so by simply adding nodes to a cluster and scaling them back in when the load reduces
- Geographically distributed: Operators can make use of synchronous and asynchronous data replication and geo-partitioning to deploy databases in geo-distributed configurations
- SQL and RDBMS feature compatible: Developers no longer need to choose between the horizontal scalability of cloud native systems and the ACID guarantees and strong consistency of traditional RDBMSs
- Hybrid and multi-cloud ready: Organizations can deploy and run data infrastructure anywhere—and avoid lock-in to any specific cloud provider.
Explore the evolution of operational databases for a cloud native world in our latest white paper, Migrating From Monolithic to Cloud Native Operational Databases. Download your copy today!