How to Migrate the Sakila Database from MongoDB to YugabyteDB with Studio 3T
Whether it’s consolidation, or modernizing your data infrastructure, data migration is a serious undertaking. Without sufficient planning, countless hours will be spent constructing the proper schema in the target database, and picking the proper ETL tools to help move the data. In this blog we will show you how to migrate the Sakila demo database from MongoDB to a YugabyteDB cluster using the Studio 3T tool.
Why migrate to a distributed SQL database from MongoDB?
There are a variety of reasons to choose a distributed SQL database like YugabyteDB over MongoDB, especially if the current needs of your application no longer match your original requirements.
What’s YugabyteDB? It is an open source, high-performance distributed SQL database built on a scalable and fault-tolerant design inspired by Google Spanner. Yugabyte’s SQL API (YSQL) is PostgreSQL wire compatible.
Here are a few reasons to consider making the switch to a distributed SQL database like YugabyteDB:
Auto-Sharding: In MongoDB, replica sets need to be manually converted into a sharded cluster. YugabyteDB automatically shards data to deliver the right balance of availability and performance the moment the first row of data is inserted.
Synchronous Data Replication: MongoDB offers asynchronous replication where a secondary pulls from a primary. YugabyteDB’s replication is synchronous and Raft-based.
Durable & Fast Writes: In MongoDB, you’ll only get durability with a majority writeConcern, which inherits an asynchronous replication lag. YugabyteDB is durable by default, while delivering lower latency.
Linearizable & Fast Reads: For linearizable reads, MongoDB requires readConcern, which is high latency and requires a quorum at read time. YugabyteDB offers linearizable single-key reads, which are served directly off the shard leader without the need for a quorum.
Distributed Multi-Shard Transactions: MongoDB doesn’t currently offer sharded cluster transactions. YugabyteDB on the other hand, has a built-in transaction manager to coordinate transactions across multiple shards.
Strongly Consistent Global Secondary Indexes: In MongoDB, secondary indexes are not global and have to be updated often, which involves blocking. YugabyteDB’s global secondary indexes leverage distributed transactions and are always consistent.
Prerequisites
Here is the environment and software we’ll be leveraging:
- YugabyteDB – Version 2.0.10
- MongoDB Community Server – Version 4.2.2
- OS – MacOS
- Studio 3T – Version 2020.1.2
- MongoDB port of the Sakila sample database
Before we import the Sakila JSON files, let’s create three collections in MongoDB:
customers
to hold customer JSON datafilms
to hold film JSON datastores
to hold stores JSON data
To create the sakila
database and the necessary collections in MongoDB, you can execute the following commands:
$ mongo sakila > use sakila switched to db sakila > db.createCollection('customers'); { "ok" : 1 } > db.createCollection('films'); { "ok" : 1 } > db.createCollection('stores'); { "ok" : 1 }
Sakila sample database
As mentioned, for the purposes of this demo we will be using MySQL’s well known Sakila sample database that features a normalized schema modeling a DVD rental store. It features things like films, actors, film-actor relationships, and a central inventory table that connects films, stores, and rentals. You can get the MongoDB version of Sakila here.
Now, let’s unpack the compressed JSON files using tar zxvf sakila.tgz
and then import the JSON files into MongoDB using the mongoimport
utility using the following commands.
$ mongoimport --db sakila --collection customers --file /Users/jguerrero/sakila/customers.json 2020-01-21T14:36:07.607-0800 connected to: mongodb://localhost/ 2020-01-21T14:36:07.739-0800 599 document(s) imported successfully. 0 document(s) failed to import. $ mongoimport --db sakila --collection films --file /Users/jguerrero/sakila/films.json 2020-01-21T14:47:17.882-0800 connected to: mongodb://localhost/ 2020-01-21T14:47:17.916-0800 1000 document(s) imported successfully. 0 document(s) failed to import. $ mongoimport --db sakila --collection stores --file /Users/jguerrero/sakila/stores.json 2020-01-21T14:47:37.097-0800 connected to: mongodb://localhost/ 2020-01-21T14:47:37.121-0800 2 document(s) imported successfully. 0 document(s) failed to import.
Migrating data from MongoDB to YugabyteDB
Now that the customers
, films
, and stores
datasets are loaded into MongoDB, let’s start the process of migrating the data into YugabyteDB. The tool that we will be using for this demo is Studio 3T. However, there are several ETL tools on the market that can serve the same purpose and we’ll be covering a few of them in upcoming blogs.
Installing a target YugabyteDB cluster
For complete instructions on how to get up and running on a variety of platforms including prerequisites, check out our Quickstart Guide. In the following section we’ll cover the basic steps for getting up and running in just a few minutes with a local 3 node cluster on your Mac laptop.
Download and extract YugabyteDB
$ wget https://downloads.yugabyte.com/yugabyte-2.0.10.0-darwin.tar.gz $ tar xvfz yugabyte-2.0.10.0-darwin.tar.gz && cd yugabyte-2.0.10.0/
Note: At the time of this writing, 2.0.10 is the latest release of YugabyteDB. Make sure to check the Quickstart Guide for the latest version.
Configure loopback addresses
Add a few loopback IP addresses for the various YugabyteDB processes to use.
$ sudo ifconfig lo0 alias 127.0.0.2 $ sudo ifconfig lo0 alias 127.0.0.3 $ sudo ifconfig lo0 alias 127.0.0.4 $ sudo ifconfig lo0 alias 127.0.0.5 $ sudo ifconfig lo0 alias 127.0.0.6 $ sudo ifconfig lo0 alias 127.0.0.7
Create a 3 node cluster and Sakila database
With the command below, create a 3 node cluster with a replication factor of 3.
$ ./bin/yb-ctl --rf 3 create
You can verify that your cluster is up and running by checking out the YugabyteDB Admin UI, which is located at:
http://127.0.0.1:7000/
With the command below, enter the YSQL shell and create the sakila
database:
$ ./bin/ysqlsh yugabyte=# CREATE DATABASE sakila; CREATE DATABASE
About Studio 3T
Studio 3T is a graphical IDE for MongoDB available for Windows, Mac, and Linux. It helps DBAs speed up tasks like query building, data exploration, import/export, code generation, and more–with or without the knowledge of the MongoDB query language. Studio 3T is a commercial product with various editions available for purchase, all available to try with a free trial period.
Exporting and importing with Studio 3T
With Studio 3T, they’ve made it easy to export data between MongoDB and PostgreSQL. Because YugabyteDB is a PostgreSQL compatible database, you can export the MongoDB collections we created to SQL files. We can then execute these SQL files against YugabyteDB and rebuild the collections in a relational format.
To export a MongoDB collection, use the export option in the menu (shown in red below), and select the collections to be exported. In this case, we are exporting the customers
collection.
Select the target export format. For the purposes of our demo, we’ll select SQL.
Next, let’s name the target table customers and check the boxes to include the ‘CREATE TABLE’ and ‘DROP TABLE IF EXISTS’ statements to the generated SQL script.
In the next step, ensure that the mappings of the columns are accurate. For example, Rentals
is a JSON subdocument in the source, which is mapped into a JSON type column in the target YugabyteDB database.
Here a snippet from the corresponding SQL file that gets generated for the customers
collection:
DROP TABLE IF EXISTS "customers"; CREATE TABLE "customers" ( "Address" TEXT, "City" TEXT, "Country" TEXT, "District" TEXT, "First Name" TEXT, "Last Name" TEXT, "Phone" TEXT, "_id" INT NOT NULL, "Rentals" JSON, PRIMARY KEY ("_id") );
Note that in the above DDL statement, Rentals
is a column with type JSON
. This means an entire unstructured JSON
document can be put into that column and queried. In YugabyteDB, the JSON
data type represents the exact text format of JSON
while the JSONB
data type represents its binary format. YugabyteDB supports both JSONB and JSON data types. For more information about working with JSON data types in YugabyteDB, check out:
https://docs.yugabyte.com/latest/develop/learn/data-types/
Repeat the above steps for the films
and stores
collections.
Creating tables and loading data into YugabyteDB
Now that you have your SQL files, return to the YSQL shell and switch to the sakila
database:
yugabyte=# \c sakila You are now connected to database "sakila" as user "yugabyte".
Execute the sakila.customers.sql
file:
sakila=# \i /Users/jguerrero/sakila.customers.sql
You can verify the data by running:
sakila=# select count(*) from customers; count ------- 599 (1 row)
Or, by browsing it in your favorite GUI tool, like DBeaver:
Repeat the above steps for the films
and stores
SQL files by running:
\i /Users/jguerrero/sakila.films.sql
and
\i /Users/jguerrero/sakila.stores.sql
Conclusion
That’s it! With the right database tools such as Studio 3T, you can easily migrate data from MongoDB to YugabyteDB within minutes. Tools like Studio 3T are especially useful if your database requirements have evolved beyond the capabilities of MongoDB to something like a distributed SQL database.
What’s Next?
- Compare YugabyteDB in depth to databases like CockroachDB, Google Cloud Spanner, and MongoDB.
- Get started with YugabyteDB on macOS, Linux, Docker, and Kubernetes.
- Contact us to learn more about licensing, pricing, or to schedule a technical overview.