How Shopify Is Re-Architecting for an Agentic Commerce Future with YugabyteDB

Yugabyte Team

What happens when a relational database design built for e-commerce is pushed to planetary scale? For Shopify, the answer involved years of custom sharding, massive replication overhead, and growing application complexity.

As transaction volumes and global access patterns continued to rise, those tradeoffs became increasingly difficult to manage.

Shopify Distinguished Engineer Brad Dietrich recently presented a live session at the Distributed SQL Summit in Atlanta (watch the replay below). In this recap blog based on the DSS presentation, we share how Shopify is navigating the move from large-scale MySQL deployments to distributed SQL, and what that shift means for engineers building systems at extreme scale.

Poster Image

Introduction

Shopify has been on a journey, and is still on that journey.

Even if you don’t know Shopify, you’ve probably purchased from them. Millions of merchants worldwide are powered by Shopify. They started primarily as an e-commerce platform to enable commerce for customers, whether they were established or new.

Entrepreneurship and bringing new entrepreneurs into the world are a Shopify key value. As Shopify has grown and evolved over the 20+ years it’s been in existence, it’s become deeply engaged in commerce beyond e-commerce. It now operates across the entire commerce spectrum.

That means more transactions in more places, which is why scaling is crucial.

Where Shopify is Today

Shopify operates in 175 countries worldwide, with a fully global presence and 24/7 operations. It serves traffic across every continent.

Its footprint today is primarily on GCP, although historically it has run its own data centers and continues to evaluate other multicloud providers. The majority of Shopify’s footprint today is on GCP.

Shopify serves billions of users. From a relational database perspective, this means tens, if not hundreds, of billions of rows in relational tables. This doesn’t scale very well in traditional relational databases. Below, we share how Shopify is solving this both today and with YugabyteDB in the future.

Shopify has extremely high transactions – 20 billion QPS is not unusual. This is very large scale, and needs to be planned as it doesn’t happen organically or “out of the box.” It’s also not free.

In his recent DSS keynote, YugabyteDB co-founder and co-CEO Karthik Ranganathan talked about the resiliency-to-cost curve. At the very high end of that curve is where Shopify starts to see problems.

But it’s also a highly normalized relational model. Shopify was originally designed as a relational model, and the application isn’t going to be re-architected into a non-relational model anytime soon without considerable effort. Instead, it needs to map that relational model to manage a very high transaction volume environment.

As Shopify evolved from simple e-commerce, which was easily shardable (it could be sharded for each merchant), into more agentic and AI-driven workloads, things became more complex.

Merchants now want to be present everywhere. This means Shopify needs a less-sharded infrastructure, as it must now present a full view of its merchants across multiple platforms. Things are changing drastically (and quickly), which is a major reason Shopify is on this evolution path.

How Shopify Got Here

Shopify started with MySQL. As you outgrow a single MySQL instance, you shard it. You must design your sharding strategy carefully to ensure it functions effectively.

Today, Shopify operates tens of thousands of MySQL nodes, both master and replication. This replication is the kind of cost-to-resiliency curve that Karthik discussed in his keynote.

Shopify have petabytes and petabytes of storage. This is not particularly efficient, because a lot of replication (more replication than strictly necessary) is not for the right reasons.

Shopify replicates for resiliency. It replicates for read transaction reasons. It replicates for read scale. It also replicates for failover cycles, although it doesn’t manage them very effectively. So, regional distribution is very high.

Shopify has a bespoke management system. But, how does it manage failovers, replicas, and master promotions?

If you’re in a world like this, you find yourself on that resiliency curve where you’re not necessarily in a single point of failure, because MySQL is the single point of failure by definition. It doesn’t actually have any quorum-based writes.

Over time, this has led to significant application complexity. Shopify has a 20-year-old application, and that complexity has leaked into the application layer. Changing the sharding model and the system to support new business needs has become both a priority and a major challenge for the company.

Shopify evaluated other MySQL-based solutions and even deployed Vitess in production, but this neither solved all its challenges nor met all its needs.

What Shopify Wanted

Ideally, Shopify requires:

  • A single global namespace
  • To run a single query across all its data
  • The resiliency benefits of a multi-quorum write
  • Not to suffer from any regional outages or any failure mode that introduces potential lag in the replication

Flexible placement strategies are also very important to the company. You can’t have all the data replicated everywhere, as this is extremely expensive and not particularly aligned with compliance rules or data privacy rules. It needs to be driven by where the reads and writes happen.

Shopify won’t be changing its application in the short term. It needs to stick to the relational model, as it has 1,500 highly normalized relational tables.

Why Choose YugabyteDB?

Shopify evaluated many options, including TiDB, which also has a MySQL-like model.

It chose YugabyteDB for several reasons, including YugabyteDB’s PostgreSQL alignment, as it understands that ecosystem and investment are clearly moving toward Postgres.

Shopify sought the consensus-based architecture behind Postgres to prevent single points of failure. It didn’t want the single point of failure that vanilla Postgres would provide, as that left Shopify in the same position it was in with MySQL.

Shopify also required geo-partitioning, as it needed to specify exactly where specific data in a table should be stored. It didn’t want the application to have to shard; instead, the application should be unaware of these things, and the database should handle them.

A crucial point for Shopify was data sovereignty – this is more of a business need than a technical one.

Finally, YugabyteDB’s open source credentials were a key differentiator.

Shopify chose to license YugabyteDB because it wants YugabyteDB to evolve. It wants YugabyteDB to continually improve, so that it’s better for everyone.

Where is Shopify Today?

Shopify operates one of the largest YugabyteDB clusters in existence. It has a globally stretched cluster, primarily between the US and Europe, with plans to expand into Asia and Australia.

It uses continent-scoped and region-scoped tablespaces to meet latency and data locality goals. Buyer data stays where buyers live; merchant data stays where merchants live. Today, most merchants are in the US and Europe, but buyers are global.

Today, Shopify’s cluster is relatively modest in node count, but massive in scale – 7,000 CPU cores and 1.4 petabytes of data on YugabyteDB. This is one of the largest deployments, and Shopify is taking steps to ensure it can reach 20x that size (and likely even larger in the future).

It is currently pushing about 200 queries per second (QPS) into YugabyteDB. So, scaling to that higher level is something Shopify is working on closely with the YugabyteDB team.

One of the biggest challenges for Shopify is helping its application engineers adapt to a distributed SQL database after working with a single-node write database. A distributed SQL database behaves differently and feels different.

Indexes are no longer just a binary tree on disk sitting on a single node. In YugabyteDB, you typically use hash range indexes, which have a big impact on queries and data placement. A fair bit of time has been spent making sure that Shopify’s primary keys are appropriately designed for hash range indexes.

Latency is another challenge. A write quorum across a continent is not going to be faster than 35 milliseconds. That’s very different from writing to a single-node MySQL database.

Shopify has had an interesting journey helping its application teams come to grips with this new world and why it’s valuable.

What is Next for Shopify and eCommerce?

The current re-architecture is critical to Shopify because it is the future of its business.

AI will drive the next interaction models. Commerce has moved from brick-and-mortar to e-commerce, and now to agentic commerce. Shopify has already launched agentic shopping capabilities and knows that the underlying data must be more normalized and massively scalable. That means continuing to scale far beyond its current level.

Shopify is working closely with YugabyteDB’s engineering teams and believes that together they can achieve the truly global footprint Shopify needs for business success in the rapidly evolving AI-driven world of commerce.

Want to know more? To discover more customer and expert insights from the recent Distributed SQL Summit, watch the replays on demand.

Yugabyte Team

Related Posts

Explore Distributed SQL and YugabyteDB in Depth

Discover the future of data management.
Learn at Yugabyte University
Get Started
Browse Yugabyte Docs
Explore docs
PostgreSQL For Cloud Native World
Read for Free