Making a Meaningful Contribution During My Fall Internship at Yugabyte

Aliqyan Tapia

Hi, I’m Aliqyan Tapia, and for the last 4 months, I’ve been interning on the YugabyteDB Managed team, our database-as-a-service offering, working on a Change Data Capture (CDC) project.

Interviewing with Yugabyte

My previous internships were at e-commerce companies. So for this internship, I really wanted to work on a product with a greater technical focus. This is what attracted me to Yugabyte. I mean, how much more technical can you get than databases? 

The interview process was fast, lasting a little more than a week. I had a technical interview with an early engineer at Yugabyte, followed by a behavioral interview. Throughout the process, I was extremely impressed by the wealth of knowledge of the interviewers. At the end of the day, I felt that I would have the opportunity to make a meaningful contribution. Isn’t that what we all want in an internship? 

Before My Start Date

A month before starting my internship, I met with Juan Tellez, Director of Cloud Infrastructure Engineering at Yugabyte and head of the YugabyteDB Managed team. We discussed various projects I could work on. 

Usually interns are given a scoped project that can be completed and rolled out within a limited time frame. However, when I heard about the ongoing CDC project, I was extremely excited and wanted to jump in on that effort.

So, for the last four months I’ve been working in the team that launched the first milestone of the CDC project within YugabyteDB Managed.

All About Change Data Capture (CDC):

All About Change Data Capture (CDC):

Change Data Capture (CDC) is the process of identifying and streaming database changes to an external source or process in real time. There are multiple use cases for CDC, such as being the backbone for an event-driven service architecture or used to perform large scale analyses in a data warehouse, like Snowflake, away from the core database.

YugabyteDB allows customers to create CDC streams and self-manage them. This is done using the Debezium Connector, an open source service that listens to changes made to a database. 

Our project looked to abstract this process by extending YugabyteDB Managed with the CDC functionality. This allows users to simply specify the database tables they would like to subscribe to, as well as the destination for their stream. Using this information, we would take care of everything—provisioning and launching the necessary infrastructure to facilitate the stream.

CDC streams with Debezium Connector

Let’s look at an example.

Consider a database table that keeps track of employees at a company. If we were to create a CDC service listening to the employee table, as soon as a new row is added into that table, the CDC service will: 

  • Detect the change
  • Provide it to a Debezium service
  • Forward the change to the external sink.

How is CDC implemented in YugabyteDB Managed?

YugabyteDB Managed utilizes YugabyteDB Anywhere to manage the database nodes running in AWS or Google Cloud Platform (GCP). To support the CDC functionality we introduced a new type of node that would reside along with the database nodes within our database universe, running a Debezium service to detect and stream changes. After provisioning the CDC node, we utilize the configuration provided by the customers to create a custom script to start the Debezium service.

For this project’s first milestone, we implemented functionality to create and delete streams, as well as services to reconfigure and redeploy the CDC Virtual Machine in case of any failures. 

We started by adding support for Kafka Sinks as an endsource, but have implemented CDC streams to be extendible to more sources in the future. From Kafka, customers can consume the messages themselves or utilize one of many existing Kafka connectors to land the changes into a permanent datastore.

Wrapping Up My Internship at Yugabyte

I had a great time interning at Yugabyte. This was my first in-person internship post-COVID, and it has been great to interact with my coworkers in person once again. I feel it allows for more impromptu conversations about the work going on in different teams. 

Some mini-highlights from my time here include winning an internal Hackathon for a CLI tool using Go and the Cobra Library. The hackathon was a fun way to onboard and quickly ramp up within the team. Outside of work, I had the opportunity to explore the west coast of the United States, visit Los Angeles for a long weekend, catch a Warriors game, and hike in Yosemite!

A huge thank you to Catalin David and Juan Tellez for all the help and support they have provided over the course of my internship.

Interested in learning about Yugabyte internships? Read more of our interns’ stories.

Aliqyan Tapia

Related Posts

Explore Distributed SQL and YugabyteDB in Depth

Discover the future of data management.
Learn at Yugabyte University
Get Started
Browse Yugabyte Docs
Explore docs
PostgreSQL For Cloud Native World
Read for Free