How Comcast Uses Distributed SQL to Handle Modern Workloads
Amit Patel, Principal Architect at Comcast, recently sat down with us to discuss his role at this Fortune 100 telecommunication conglomerate. With well-known brands such as Xfinity, NBC, Telemundo, NBC, and streaming service Peacock, Comcast has several business units dedicated to the management and distribution of data around the world.
Amit chatted with us about the evolution of databases, their challenges with their legacy solutions, and the need for a distributed SQL database to support the demand for modern workloads. He also shared the criteria for evaluating new technologies and how they align with business goals and objectives.
Read the excerpt below for the main discussion takeaways, or watch the full interview.
Q: Could you introduce yourself and provide additional information about the team you lead and your role at Comcast?
I am a principal architect with Comcast and have been on various teams over the last 12 years, where I have participated in numerous digital transformation initiatives and new technology rollouts. My team focuses on identifying and delivering core database platform services to various developer teams across our organization. We focus on existing core solutions as well as new technologies to provide the best offerings to our developers, which helps drive business outcomes and value to our end customers. Databases are a significant part of our charter, and we have experience with almost every database out there.
Q: How has your team at Comcast responded to the accelerating pace of technological change over the past 5-10 years, and which digital trends have had the greatest impact on your team and your customers?
You’re right. Technology is constantly evolving, and one of our main objectives at Comcast is to keep track of these changes, identify how we can utilize them to benefit our customers and offer more valuable services. Due to our large size and extensive range of services, we use many different technological solutions.
Two of the digital trends that had the biggest impact on us include:
- Cloud and infrastructure changes for deploying, maintaining, and offering services. We have seen significant benefits from utilizing on-prem solutions in conjunction with cloud services.
- Containerized solutions like Kubernetes. Our biggest stakeholders are our developer teams, and how they build and deploy apps is very different today. So this does put pressure on any new underlying infrastructure to match the agility and scalability of the containers.
Q: Have you observed an increase in the variety and unpredictability of workloads, as well as changes in their size and timing?
Software is becoming more important nowadays, and most of our services rely on microservices-based applications. As a result, we are seeing more diversity in the types of applications we support. Along with modern applications, we also need to remember our existing applications that have a more traditional (monolithic) architecture.
Everything is becoming more demanding. The pace of new application development is increasing rapidly, and there are high expectations to deploy reliable, secure applications instantly. Data is also crucial. For many critical applications, data must be secure and consistent regardless of the application users’ location. To put it bluntly, our customers require consistent data.
Q: You work with databases daily. Take us through the examples of some of the databases you are using and what the database journey has been like for you over the past few years.
We have supported classic RDBMS solutions for years, like Oracle, MySQL, and SQL Server. We were early adopters of NoSQL solutions like MongoDB. However, there is no such thing as a blanket solution. They all have their benefits and trade-offs. But as we look to the future and the demand of modern workloads, our data should be distributed. It should be near the customers or the apps, which is accelerating our adoption of distributed SQL.
Q: Can you discuss some of the challenges and issues with legacy solutions that have driven changes in your database landscape?
The evolution of databases has aligned with the change in the rest of our IT stack—from infrastructure to applications. Much of this evolution has been driven by the variety of use cases we have to support. For example, not all applications require multiple data centers, geo-distributed data, or synchronous replication. Those requirements add additional costs and complexity that we want to avoid if not required.
But we definitely have many applications that have high data demands and require consistent reads—and delivering those features isn’t easy with legacy solutions.For example, we have legacy database solutions in multiple data centers—maybe one on the east coast and one on the west coast— where only one side can be written to at a time. So if we want to have active-active replication, we must set up and maintain some pretty complex replications. Then we have to ensure that the app teams build logic into the code to handle conflicts.
To address those demanding applications, there are a number of challenges or issues with legacy solutions we’ve traditionally used. For example, one is replication lag. Here, the applications must be architected to handle the unavailability of committed data in an active-passive scenario.
Another is having a unified view of the cluster, which we can’t do today. Without it, separate clusters must be maintained for each data center, which requires manual monitoring and maintenance. Looking ahead, distributed SQL offers the potential to address these challenges and provides a more efficient and effective approach to data management.
Q: As you look ahead to evaluating new technologies, there are compelling reasons to adopt these technologies despite the large number of databases already in use. What are you prioritizing now that is important internally and to your customers?
The priorities for evaluating new technologies will vary depending on the application. But for transactional applications, some specific requirements immediately come to mind, such as scalability, consistency, resiliency, cloud-native architecture, and optimizations. It’s also important to have SQL support for legacy application logic. We don’t want to retrain and rewrite. And, of course, security is always a critical concern, so authentication, authorization, and accountability should be built in.
Q: You mentioned consistency as an important factor, which definitely comes up all the time. People are looking to combine the benefits of SQL’s consistency with the scalability and resiliency of NoSQL. Is that one of the primary drivers of change? And how have you balanced the need for consistent data with the need for scalability and resiliency in modern applications?
Transactional consistency becomes difficult in a cloud-based world where distributed systems and microservices are the default architecture. Now, not all apps need strong consistency. Every app is different. But for those applications that need strongly consistent reads and require data to be distributed across various geographic regions, we require both scale and resiliency along with strongly consistent data. That is what we are looking for, so we are adopting distributed SQL.
Q: How does open source fit into your IT strategy, and how do you prioritize it when evaluating solutions? With the rise of partial open source and eventual open sourcing, there’s a lot of gray area to navigate.
As a team, we are big fans of open-source solutions. With any new technology, we spend time understanding the trade-offs and exploring the pros and cons. There are indeed a lot of products that are partial open source and eventual open source, so not everything is true open source. In general, we incorporate many open-source solutions into our environment. We do use contrasting solutions like open source and Kubernetes. We use the community versions of databases. The benefits of open source are its flexibility and ability to deliver scale, consistency, resiliency, and security. That is what we are looking for, and that’s what we are doing.
Q: Talk us through what stood out about distributed SQL and got you started on your distributed SQL journey. Where are you now?
We’ve been on this journey for a few years and have explored and adopted a few solutions. Because shifting to new solutions isn’t something we do lightly, we look for compelling reasons that significantly benefit our teams and customers. For us, the one benefit of distributed SQL that really stood out was fault tolerance. It offers resilience against failure with native failover repair. Additionally, we appreciate its linear (i.e. horizontal) scalability, the ability to scale writes on demand through auto-sharding or rebalancing, and the geo-distribution capabilities that result in lower user latency across all geographically dispersed data distributions.
Another key factor for us is it’s support for SQL, which is the language of data and the default for all application logic. Our existing apps are written in SQL and use advanced features like stored procedures. We want to be able to move those apps and modernize them without causing a huge headache. With the right solution, like YugabyteDB, we can minimize the impact on development teams while enjoying all the new benefits of distributed SQL.
Q: How have you approached the migration and modernization of applications? Are you taking a refractor approach or a re-platform approach? Has there been a standardized process on how you shift applications and embrace newer technologies such as distributed SQL?
We have taken both the refactoring and re-platform approach; it depends on the type of workload and application. We do have a few workloads which are running active-active on a legacy database, and as mentioned about writes, it was challenging to maintain two sets of database clusters, which require us to do additional steps manually for app updates, patches, monitoring, etc. For those apps, we were able to do the actual refactoring and move them to use distributed SQL. We are also able to migrate from one database platform to another with minimal changes and rewrites. We’ve been successful with both of these approaches and have workloads that went through both processes running in production on distributed SQL now.
Q: As a massive enterprise with a large IT team that can provide services on its own, where do the databases-as-a-service (DBaaS) best fit into your strategy? What’s your approach to it in general?
Database-as-a-service (DBaaS) is our key strategy when it comes to databases. We aim to provide more platform services that allow developers to focus solely on development without worrying about database maintenance, infrastructure, or operational work. The database connections should be readily available when needed while we take care of patch consistency, vulnerability fixes, backup, monitoring, and other Day 2+ operations behind the scenes. We have developed DBaaS services internally to cater to our larger Comcast development team, including RDBMS DBaaS for Oracle and MySQL, as well as NoSQL MongoDB DBaaS. Our ultimate goal is to incorporate other distributed SQL and cloud-native services into our offerings.
Q: How do you assist developers in identifying the most suitable database for their applications? What are the critical factors to consider while navigating this vast database landscape?
Given the size of our organizations, automation is our best friend. As we discussed earlier, we support many database technologies, so we developed a database selection tool that’s accessible to all our internal teams.
The tool makes recommendations based on the user’s criteria. This includes factors such as consistency type, scalability, replication requirements, deployment/configurations options, security features, backup options, and other critical features, such as preference for open-source or enterprise-backed solutions. The tool generates multiple recommendations, each with details on cost and a fit strength score (on a scale of 0 to 100). The user can select the most suitable option from the recommendations we provide. This is how we aid our teams in finding appropriate database solutions amidst a crowded and vast database landscape.
Q: How has the relationship between Ops teams and developers evolved, and what are your thoughts on this?
It has evolved over the years because both sides realize the importance of working together and understanding each other’s roles better. Nowadays, developers are more knowledgeable about DevOps, DevSecOps, and containerized solutions for deploying their applications. They have more access to database technologies through Google searches and cloud downloads. They can run basic configurations but want to avoid handling the Day 2 operational tasks like scaling, backup, performance monitoring, raising alerts, and security.
While developers can handle the basic installation of a DB, optimal configuration is crucial for everything to work correctly. They would have to dig deep to understand the core technology architecture for optimal configurations. Security is another area where developers prefer to leave the responsibility to infrastructure and DevOps teams.
Q: What are some best practices for introducing new technology to a large organization? Do you have any advice or learnings on how to help people and organizational teams comfortably use new technology?
I agree that any change is more than just implementing the technology. It is important to establish best practices for ongoing support and maintenance and ensure that all stakeholders are aware of the new solutions and the reasons behind their implementation.
To facilitate this process, I recommend breaking it down into three steps:
- Fit assessment involves determining whether the app or dataset requires distributed SQL, considering factors such as resiliency, geo-distribution, scalability, and modernization.
- Migration assessment is critical to define the scope and complexity of the migration. Do not start your migration before completing this step.
- Execution involves deploying, converting, optimizing, and testing the migration to ensure it meets security requirements.
At Comcast, we call this process Resiliency Wargaming, and it has helped us. This is a process anyone can follow and implement as well.
Q: What are you looking forward to next on your distributed SQL journey?
There are two fronts that we want to adopt during the next phase of our journey.
- DBaaS. We already have a few DBaaS implementations, which we talked about earlier. However, we are looking to create containerized-based services that can run anywhere, including on a Kubernetes service.
- A platform migrator. The focus of this tool is to help apps teams migrate from one platform to another when adopting distributed SQL. We have already created a guideline called the Migration Analyzer, which helps the data and developer management team assess the complexity of migration. However, we want to expand our toolset to include other migration tools to ease the adoption of distributed SQL. This will help us create a more flexible and adaptable infrastructure, enabling easier adoption of new technologies and increased scalability.
Want to hear more? Check out the entire conversation!