How to Migrate Data from Cassandra or MySQL to YugabyteDB?
If you work with databases, at some point you are going to need to get data in and out them using a format that can be consumed by a completely different system. YugabyteDB makes use of CSV files to make this as easy as possible. The CSV format is arguably the most universally portable way to get data migrations accomplished.
TL;DR – YugabyteDB makes use of Cassandra’s COPY FROM command and a forked version of Cassandra’s Bulk Loader to get data into the system. It uses a tool called Cassandra Unloader to get data out of the system and into a CSV format.”
For the purposes of this blog post we are going to be working with the Cassandra-compatible YCQL API.
In this section we’ll walk you through the following operations:
- Creating a destination table in YugabyteDB
- Exporting data from Cassandra
- Exporting data from MySQL
- Importing data into YugabyteDB
In this example we are going to create a table called
SensorData in YugabyteDB and assume that a compatible dataset already resides in either a Cassandra or MySQL database. If you’d like to generate a sample dataset for use with this example, check out the “Prepare Source Data” section in the import documentation. The DDL to create the table is as follows:
CREATE KEYSPACE example; USE EXAMPLE; CREATE TABLE SensorData ( customer_name text, device_id int, ts timestamp, sensor_data map<text, double>, PRIMARY KEY((customer_name, device_id), ts) );
Getting data out of Cassandra and into a CSV file is a straightforward operation:
cqlsh> COPY example.SensorData TO '/path/to/sample.csv';
Similarly, exporting data out of MySQL and into a CSV file is a simple process:
SELECT customer_name, device_id, ts, sensor_data FROM SensorData INTO OUTFILE '/path/to/sample.csv' FIELDS TERMINATED BY ',';
How best to import data into YugabyteDB largely depends on the size of database. For small datasets in the MB rage we recommend using Cassandra’s
copy from command. For example:
cqlsh> COPY example.SensorData FROM '/path/to/sample.csv';
For datasets in the GB and multi-TB range we recommend using our fork of the Cassandra bulk loader. You can get the bulk loader here. Running the import using the bulk loader would look as follows:
time ./cassandra-loader \ -dateFormat 'yyyy-MM-dd HH:mm:ss.SSSSSSX' \ -f sample.csv \ -host <clusterNodeIP> \ -schema "example.SensorData(customer_name, device_id, ts, sensor_data)"
In this section we’ll show you how to export data from YugabyteDB into CSV files using the
Let’s use the following DDL to create a source table called USERS for use in our export example:
CREATE KEYSPACE load; USE load; CREATE TABLE users( user_id varchar, score1 double, score2 double, points int, object_id varchar, PRIMARY KEY (user_id));
If you’d like to generate some sample data to populate the table above, check out the “Generate Sample Data” and “Load Sample Data” sections here.
cassandra-unloader tool, execute the following command:
./cassandra-unloader \ -schema "load.users(user_id, score1, score2, points, object_id)" \ -boolStyle 1_0 \ -host <clusterNodeIP> \ -f outfile.csv
You can analyze the csv data directly or import it into a different YugabyteDB cluster. That’s all there is to it! For more information concerning data migration features, check out the “Data Migration” section in the docs.