Justuno’s Database Journey from Ground to Cloud
At a recent Distributed SQL Summit, Travis Logan – CTO & Co-Founder at Justuno, presented the talk, “Evolve: A Database Journey from Ground to Cloud.”
With over 14 years of experience with Microsoft SQL Server, Travis is well-versed in the evolution of transactional databases. In this talk he summarizes his experiences with traditional RDBMS while trying to improve their redundancy, scaling characteristics, and performance. Ultimately he walks us through how the cloud has helped usher in the new era of distributed SQL databases where many of these concerns like distributed ACID transactions, horizontal scaling, and fault tolerance are handled natively by the database.
In Travis’ first story he tells how in 2006 he had to wheel in his own hardware into a Los Angeles datacenter that was kept below 50 degrees. This hardware was the foundation for the installation of his first Active / Active Microsoft SQL Server cluster. 12 hours later, he was unsuccessful in getting the cluster up and running and had to wait on Dell to provide a patch so he could complete the installation. Here’s the pros and cons of an Active / Active SQL Server cluster that he discovered after using it in production.
- Servers are redundant, but data storage is at the mercy of the SAN device (which creates a SPOF)
- No zone availability
- In reality, Active / Active meant you had to split up your database onto two instances so that each one is “active” for its own database, while being the failover for the other
- Although online updates were supported, in reality it meant a minute or more of application downtime while the cluster failed over
In the second story, Travis talks about how in 2009 they migrated from an Active / Active SQL Server cluster to a SQL Server database in the cloud because they began to hit CPU and storage limits. At the time, SQL Server was only available on Amazon EC2, Azure, and Rackspace. Ultimately, they went with an Active / Active cluster on StrataScale’s infrastructure. Here’s why:
- AWS didn’t have the necessary flexibility to allow the deployment of an Active / Active cluster
- Azure was still in beta and did not support many of the required SQL and TSQL features
- StrataScale was able to provide a hybrid setup that ultimately worked. They made it easy to swap out the storage underneath the cluster and they even offered to set the whole thing up!
Unfortunately, because all of StrataScale’s customers relied on the same underlying storage infrastructure, it also meant that failures in this part of the stack led to catastrophic outages for everyone. As Travis puts it, “There were constant network outages, dropped disks, and outages that could last from an hour to 36 hours.”
Heading into 2015, Travis started ramping his own company called Justuno which is a cloud-native website visitor conversion optimization platform. The database requirements of the platform were:
- Horizontally scalable so it could support millions of visitor profiles and billions of requests
- Could be easily deployed on the cloud
- Support large amounts of data
- Deliver very low latencies, near real-time
- Redundant by default
Travis next walked us through his experience with using Apache Cassandra to meet his new set of requirements. After using it in production, he came to a few insights:
- It’s easy to add nodes
- Monitoring is easy with the right tools
- Although nodes could support up to 2 TB, best practices dictated that no more than 50% of the storage be used or run the risk of repairs failing
- Secondary indexes were not usable, despite being available
In the most recent step of Justuno’s evolution, Travis and his team took a fresh look at revamping the platform’s data infrastructure so they could avoid the previous challenges they had experienced. Their new database requirements included:
- First class support on Google Cloud Platform
- Everything needed to be scalable and automated to a high degree
- Needed to support SQL, but at the same time be friendly to NoSQL-styled workloads. There could be no SQL vs. NoSQL situation here.
At first they tried Spanner, but found that it was very expensive and that the latency numbers were surprisingly poor.
After evaluating several other possible databases, Justuno ultimately settled on a YugabyteDB cluster on GCP. With YugabyteDB they were able to achieve:
- Single-digit latency for both writes and reads
- Point-and-click on-demand scaling of clusters
- Low Total Cost of Ownership with minimal operational overhead
- High scalability, especially during workload spikes on a multi-tenant system hosting tens of thousands of websites
Check out all the talks from this year’s Distributed SQL Summit including Pinterest, Mastercard, Comcast, Kroger, and more on our Vimeo channel.