System of Record Transformation: From Legacy to Cloud Native
At Yugabyte, we work with many Fortune 500 companies and have helped many of them modernize their various systems of record. These systems manage and protect vital data, and are no longer hidden away behind locked doors to be only used and queried by a select few. Today, these systems are integral to many of the interactions we conduct daily through web applications. From logging into bank accounts and utilizing payment/money transfer services to accessing insurance or healthcare records, we routinely engage with many different systems of record.
A system of record demands accuracy, consistency, and resiliency. While avoiding downtime is challenging for any cloud-based system, strategies like replication and sharding help maintain high availability.
Being able to recover quickly from any potential disruption, is vital. Critical database capabilities like point-in-time recovery—retrieving data from the last snapshot—ensure a system of record, such as a product catalog, can be quickly restored and ready to serve customers with a minimal RTO (recovery time objective).
The requirements for a system of record have long fueled IT giants like Oracle. But the IT landscape that these systems operate in is fluid, swiftly changing from cloud adoption at the infrastructure level to microservices modernization. Now, the data layer—the middle layer—is undergoing rapid transformation.
So, against that backdrop, let’s explore a specific customer example to illustrate the modernization of systems of record. Several years ago we began working with a top five global retail customer (who shall, unfortunately, remain nameless). We quickly identified their priorities:
- Ability to quickly scale up and down
- Flexibility to use various platforms/cloud providers
- Resilience against outages.
In short, they needed a robust system that could scale up and down to handle peak times and not be impacted by public cloud outages (which happen much more often than people realizes).
Like most organizations, retailers must continually adapt to serve the modern consumer. Three priorities—scale, flexibility, and resilience—are critical to delivering a successful retail experience. But modernizing legacy systems to support those priorities is a complex task.
A decade or so ago, monolithic RDBMS, like Oracle, were the standard. They served data efficiently, wherever it was needed. However, the only option to support growth and demand fluctuations was to vertically scale. Problems emerged when vertical scaling reached its limits (or the hardware became too expensive). This led to manual sharding, where data architects kept a metadata state table outside the main system. But this introduced complexity to a live system of record. If a shard became too large, you had to reshard repeatedly, creating a challenging system to manage. When an outage occurred, reconciling data and identifying the outage point was cumbersome and time-consuming.
NoSQL databases emerged to address these sharding and scaling issues. However, because these databases lack SQL semantics, significant changes to existing applications are needed. Our global retailer leveraged NoSQL Cassandra, but faced downstream issues, mainly due to the non-relational data format. The continuous complexity in data masking and regeneration led them to YugabyteDB.
Let’s explore the construction of a global product catalog, a practice applicable across industries other than retail—like finance or healthcare. The focus is on data modeling and the pipeline used to store core global IDs for products in a system-of-record format. For instance, variations in colors and sizes of a Nike t-shirt become global product IDs in this retailer’s catalog.
Now let’s take the example a step further. Imagine you are looking to buy an iPad from this global retailer, which, as you search, is shown alongside 10 other options from different marketplace vendors on the retailer’s ecommerce site.
The global product catalog (i.e. system of record) groups unique products, shows various options, and lets users choose based on quality, price, proximity, shipping, etc. It’s crucial this identification service achieves data consistency and correctness, can scale seamlessly to handle peak demands, and can meet the expected service level agreements (SLAs).
The technical attributes of our retailer use case are extremely read-heavy, consisting of approximately 90% reads and 10% write changes to the catalog daily. Every transaction must be ACID compliant for consistency across all transactions, including interactions with end customers and downstream systems.
Speed was essential. The system was optimized for low latency; our P99 for reads is 3 to 5 ms. Since the deployment is multi-region, multi-datacenter (across the US), writes occur between 50 and 75 ms. Our P90 for writes is close to 75 ms.
Because of the requirement for a geo-distributed, multi-region deployment, the deployment topology utilizes a strongly consistent “stretch cluster” across multiple data centers. In this specific case, a 36-node cluster has been deployed across the West, Central, and East. Analysis shows that 80-85% of calls are directed to the US Central load balancer.
Bulk loading was streamlined using cloud-native pipelines like Spark, Kafka, and Akka, with nightly updates and approximately 5% daily refreshes. Kafka primarily handles the bulk loading, while a combination of Spark and Kafka is used for data cleansing. The cleansed data is loaded into YugabyteDB to serve as the system of record, and reactive Spring microservices, facilitated by our Spring Data YugabyteDB modules, pull this data for use in the end applications.
Finally, being cloud-native, we support active-active architectures. This, allows customers on a digital modernization path to swiftly scale stateless microservices and leverage YugabyteDB’s active, stateless nature, a feature provided natively.
There are notable performance gains with YugabyteDB across several key areas:
- Fast Read Operations: YugabyteDB allows for rapid searches for products using ASIN (global unique IDs) providing fast read operations, which are essential for efficiency.
- Global ACID Transactions at Scale: Because the previous system could not perform multi-row or multi-shard ACID-compliant transactions, there was a huge limitation on the number of orders that the system could handle. This required frequent manual data reconciliation and additional complexity. With YugabyteDB, operations like BEGIN TRANSACTION or END TRANSACTION can be conducted in a single block. If something fails, it will roll back.
- Ease of UI Pagination in Retail: The ability to perform simple queries makes UI pagination simple and negates the need for client-side filtering. It is now possible to define tables like “product” in categories like “music,” then limit and offset queries for effective pagination. This has helped the ecommerce site run smoothly.
- Natively Storing JSONB Data with Flexibility: YugabyteDB’s core storage engine, DocsDB, enables native storage of JSONB data type. This means you don’t have to stick to a rigid data model or define all columns upfront. This flexibility in using JSON for schemas, in turn, assists with various tasks, including onboarding different merchants.
- Data Modeling and Legacy Transition: With YugabyteDB, migrating from a legacy architecture doesn’t force changes to the data models since both PostgreSQL-compatible and Cassandra-compatible APIs are available. The retailer transitioned from storing global product catalog data across 50+ tables (which happened due to limitations in the previous system) to efficiently using globally-consistent secondary indexes. This saved a significant amount of developer time.
- Enhancing Developer Efficiency: The retailer has moved from creating their own complex indexing strategies to taking advantage of YugabyteDB’s ability to create indexes on any columns. This move has led to substantial efficiency gains in architecting the application since the database now does the hard work instead of the application.
- Improving Existing Applications: The native storage of JSONB data has minimized changes to existing applications and simplified the architecture. This ability to tap into the document data model and store JSONB natively in the core system provides much-needed flexibility and ease of use.
YugabyteDB offers a streamlined and robust system, uniquely capable of handling global product catalog data and enhancing efficiency for global retailers. At Yugabyte, we focus on creating resilient data stores with high availability, emphasizing accuracy, and ensuring seamless interaction. By using strongly consistent data sharding and replication, aiming for a zero RPO, and working to minimize the RTO window for outages, we uphold our commitment to recovery and consistency.