Three NoSQL Challenges That Can Be Solved with Distributed SQL
Companies are increasingly migrating their mission-critical applications from legacy, monolithic architectures running inside on-premise data centers to modern microservices-based architectures running on public, private, or hybrid cloud infrastructure.
Additionally, application development teams are embracing multiple technologies like Docker and Kubernetes to help make these migrations possible. However, there’s a major roadblock—the inability to move the stateful data layer (made up of a persistent database and an in-memory cache) at the same speed as the stateless application layer.
It is well known that legacy SQL-based relational databases are unsuitable for a dynamic cloud infrastructure. On the other hand, NoSQL and NewSQL databases only look cloud ready from a distance. They also present significant operational challenges when run at scale in production.
In this excerpt from the recent white paper, Migrating From Monolithic to Cloud Native Operational Databases, we look at three of the challenges faced by companies using the current generation of NoSQL databases:
- Operational complexity
- Frustrating application development
- Inconsistent customer experiences
Databases have evolved to become cloud hosted, but they are far from cloud native. This makes it difficult for them to exploit the elasticity and geo-redundancy of modern cloud infrastructure to its maximum potential. On top of this, the eventually-consistent core of NoSQL adds hidden costs. These include performance-killing background repairs and unpredictable, memory-intensive compaction storms. The fact that most enterprises also run an independent caching layer (such as Redis or Memcache) alongside their persistent database simply makes all operational challenges twice as hard.
When you add in the need to modernize the data layer to the same extent and at the same velocity as the application layer, those operational challenges make everything near impossible. If operations teams force such a move, the result is business loss manifested as unpredictable downtimes and manual, error-prone war rooms.
Application developers like the simplicity of ACID transactions, so they can easily reason about the read/write behavior of their database client code. Relational databases support multi-row ACID where multiple related rows update or flow in an all-or-nothing and consistent manner. However, most NoSQL databases do not support even single-row ACID transactions. This is because eventual consistency leads to the “C” being compromised at the remote replicas.
Many NoSQL database vendors are starting to realize the advantages of strong consistency. As a result, they allow their eventually consistent systems to tune to quorum-based strong consistency settings. However, this form of tunable consistency is not truly “consistent” in many situations. This manifests as dirty reads after failed writes, and unpredictable reads after the last write wins. Developers using this approach must spend even more time testing their applications to guarantee predictable behaviors.
There is also the challenge of keeping the independent in-memory cache layer consistent with the underlying persistent database layer. The application handles cache invalidation and cache population carefully to avoid poor performance. Since most databases are only good for a specific application’s needs, developers have to take on the burden of evaluating new databases.
Error conditions are unavoidable. And this is with all the tuning efforts from developers to build strongly consistent OLTP/HTAP applications on eventually consistent NoSQL databases. During these error conditions, inconsistency reveals itself to end customers.
For example, a retailer’s product catalog removed a few items since they were no longer available. However, those items were still viewed by the customer because the node hadn’t yet applied the deletes.
Another example involves ignoring some of the time-series metric data. This data calculates aggregates in alerting for time-series monitoring and Internet of Things (IoT) use cases. It never pays to wake team members in the early hours based on incorrect data. Similarly, if a user’s privacy preferences are not immediately honored, it’s possible his actions will appear to other users in the same account.
A distributed SQL database, like YugabyteDB, is a single logical relational database deployed on a cluster of servers. The database automatically replicates and distributes data across multiple servers. These databases are strongly consistent and support consistency across availability and geographic zones in the cloud.
At a minimum, a distributed SQL database has the following characteristics:
- A SQL API for accessing and manipulating data and objects
- Automatic distribution of data across nodes in a cluster
- Automatic replication of data in a strongly consistent manner
- Support for distributed query execution so clients do not need to know about the underlying distribution of data
- Support for distributed ACID transactions
Business innovation is putting pressure on traditional systems of record. This forces companies to deliver high-value applications and services more quickly, while lowering IT costs and reducing risk through compliance.
But these applications—in the form of microservices, born-in-the-cloud applications, and edge and IoT workloads—require a new class of database that is:
- Resilient to failures and continuously available: Critical services remain available during node, zone, region, and/or data center failures as well as system maintenance with fast failover
- Horizontally scalable: Operations teams can effortlessly scale out even under heavy load without downtime. They can do so by simply adding nodes to a cluster and scaling them back when the load reduces
- Geographically distributed: Operators can make use of synchronous and asynchronous data replication and geo-partitioning to deploy databases in geo-distributed configurations
- SQL and RDBMS feature compatible: Developers no longer need to choose between the horizontal scalability of cloud-native systems and the ACID guarantees and strong consistency of traditional RDBMSs
- Hybrid and multi-cloud ready: Organizations can deploy and run data infrastructure anywhere—and avoid lock-in to any specific cloud provider.