From Mainframes to Microservices: Leveraging Change Data Capture in Modern Databases
Change data capture (CDC) efficiently identifies and tracks data changes in a database, so that actions can be taken based on these changes. YugabyteDB’s CDC captures data changes in the database and streams them to external processes, applications, or other databases. It tracks and relays changes from the YugabyteDB database to downstream consumers using its Write-Ahead Log (WAL).
YugabyteDB offers three DBaaS deployment models: self-managed, co-managed, and fully managed, in addition to its open-source version.
Our self-managed option (YugabyteDB Anywhere) offers a customer-managed control plane for creating, orchestrating, monitoring, and managing the database. This architecture includes a built-in change data capture (CDC) service. YugabyteDB’s CDC integrates with Debezium, an open-source Kafka connector that ties YugabyteDB into Kafka and other downstream sources. It allows you to build a data pipeline connecting YugabyteDB to Kafka and those other downstream sources, so that changed data from YugabyteDB moves through Debezium and into Kafka.
From there, it can be sent to various connectors like Elasticsearch, Snowflake, Amazon S3 or anything in the Kafka ecosystem, including ksqlDB and KStream.
YugabyteDB’s CDC offers several key features.
- It operates as a log reader-based capture, working with YugabyteDB’s write-ahead logging (WAL) format.
- It pulls transaction changes aggregated in micro-batches by the Yugabyte database.
- It is timeline consistent (both row- and shard-based).
- It supports JSON and Avro formats.
- It adheres to Kafka’s semantics of at least-once delivery and offers adjustable time size/disk size-based retention, allowing users to customize retention settings on either the YugabyteDB or Kafka side.
- It supports initial snapshots, so you can take a current snapshot of your data to use as the starting point for the feed into Kafka and proceed from that point with ongoing changes.
- It facilitates cloud and on-premise change delivery. For example, it can sync data from on-premises environments to cloud-managed services like Snowflake or Redshift in near real-time, as long as there’s network connectivity.
- It offers transactionally consistent CDC. This is a new feature as of YugabyteDB 2.20. This new feature provides an aggregated view of every transaction in the order it occurred, for any downstream system that requires it.
Let’s take a look at two Yugabyte customers who are successfully using change data capture.
- A large brokerage institution (which must, unfortunately, remain anonymous) uses change data capture to stream data into YugabyteDB. Their goal was to lessen their reliance on a costly mainframe system and gradually establish YugabyteDB as their primary system of record. Initially, the Yugabyte database served as the system of reference for the microservices applications, staying in sync with the mainframe through CDC. By using this approach, the brokerage firm was able to step up its adoption of YugabyteDB, reduce its footprint on the mainframe, and cut costs. Notably, they achieved a tenfold increase in performance, getting up to 200K business transactions per second, with latency at 10 ms (or less). This transition marked their shift from the mainframe to YugabyteDB for critical applications.
- A large streaming media company (which also must remain anonymous) was using MySQL for the streaming workloads that supported their customer subscription and program catalog data applications. They needed to migrate from this MySQL-based architecture because it was not resilient or reliable enough, especially during high-traffic periods, say for a high-profile sporting event or a popular show. They did not want those systems (and apps) to go down and have their subscriber base unhappy and disillusioned.Recognizing the need for a more resilient database architecture, they chose YugabyteDB. We collaborated to develop a migration strategy from MySQL to YugabyteDB, incorporating a “fall forward” approach. The media company used a technology from Arcion to transfer data (using CDC) from YugabyteDB to a secondary MySQL database, ensuring it stayed in sync during the migration. This strategy significantly reduced the migration risk, ensuring they could fall forward to the in-sync MySQL database if needed. The result? YugabyteDB was smoothly and quickly integrated into their overall infrastructure, and they have seen significant improvement in their service delivery.
- Integrate YugabyteDB with Amazon Redshift Using AWS Managed Kafka Stream and CDC Connector
- Stream Data to Amazon S3 Using YugabyteDB CDC
- Change Data Capture (CDC) From YugabyteDB to Elasticsearch
- A Behind-the-Scenes Look at Chaos Testing: A YugabyteDB CDC Case Study
- YugabyteDB CDC Library
- Real-Time Decision-Making: How Event Streaming is Changing the Game
- Unlocking the Power of Event Streaming with YugabyteDB: Real-World Use Cases (Customer Story)
- Abra Controls Effortlessly Ingests Millions of IoT Data Points for Faster, Better Decision-Making (Customer Story)
- Build a Scalable Streaming App with Django, Celery and YugabyteDB (How To)
- Data Streaming Using YugabyteDB CDC, Kafka, and SnowflakeSinkConnector (How It Works)