Distributed SQL Tips and Tricks – April 21, 2021
Welcome back to our bi-weekly tips and tricks blog where we recap some distributed SQL questions from around the Internet. We’ll also review upcoming events, new documentation, and blogs that have been published recently. Got questions? Make sure to ask them on our YugabyteDB Slack channel, Forum, GitHub, or Stack Overflow. Let’s get into it:
Importing a Large Table Using Smaller Transactions
When first playing around with YugabyteDB many users import existing data from other databases, CSV files, etc. in order to test out functionality and performance. Occasionally with the right combination of low memory (running on a small memory machine), and high row size (large number of rows in the file that is being imported), one can run into an error such as the one below:
ERROR - IO error: Errors occurred while reaching out to the tablet servers: . Errors from tablet servers: [Remote error (yb/rpc/outbound_call.cc:440): Service unavailable (yb/rpc/yb_rpc.cc:167): Call rejected due to memory pressure: Call yb.tserver.TabletServerService.Write 172.21.0.3:54934 => 172.21.0.3:9100 (request call id 123),
Let’s run through an example. Say you are running a single node cluster with 1 YB-Master server, 1 YB-TServer, and a replication factor of 1. The data format we are looking to import looks like the following:
CREATE TABLE large_table( f1 int primary key, f2 char(200) , f3 text , f4 text );
f4 are long strings. When using the YSQL API and the
COPY FROM command we are able to import all 1000 rows in the CSV. However, it breaks with 1500 rows with the error called out above. This is more than likely because your row size is too big for the amount of memory you are running on the machine. In such a case use the
ROWS_PER_TRANSACTION attribute and set it to load in batches of smaller number of rows.
COPY large_table FROM '/home/yuga/Desktop/large_table.csv' WITH (FORMAT CSV, HEADER, ROWS_PER_TRANSACTION 32);
You can find more info on using the
COPY command here, including how to import a large table using smaller transactions.
Zero Downtime Migrations with YugabyteDB
When looking to migrate from one database to another there are typically a handful of questions one must ask while evaluating the different migration options, such as: what is my source and target, how much downtime can we afford, how familiar are we with the tools, and what is the total cost? This being said, there are a number of ways to migrate to YugabyteDB from your source systems. The process of migrating to YugabyteDB largely looks similar to the process of migrating an application to PostgreSQL, however there are some minor differences. If downtime is not an issue, the recommended way to migrate your data to YugabyteDB is through a CSV import/export. You can perform these actions either manually or programmatically.
If you have a requirement for zero downtime one available avenue is the use of Blitzz, where YugabyteDB is a certified ecosystem partner. Blitzz can be used to migrate from a variety of databases like Oracle, SQL Server, MySQL, MongoDB, DynamoDB, Cassandra, and Postgres as outlined in this blog post. Another available solution is the open source distributed platform for change data capture by the name of Debezium.
Apart from this you can try approaches to reduce the window of downtime, as well as perform preliminary setups beforehand on the new setup to avoid any surprises at the time of actual data migrations. The usual
pg_dump/pg_restore will work with YugabyteDB. Apart from that we are working on making
pgloader available for YugabyteDB as well. You can track the status of the GitHub issue here.
Changing Default Ports on YugabyteDB
When using YugabyteDB there are default ports that need to be opened in order to allow different types of access such as client API communication, internode RPC communication, access to the UI web servers, common firewall rules, and Prometheus monitoring endpoints. In order to change these default port settings you can add flags to your configuration file in order to make the necessary adjustments. When would this be necessary? Let us say for example that your application uses port 9000, which is the default port for the
yb-tserver web UI. Unless you change the default port of one or the other you will run into conflicts. If you decide to change the default port of the server rather than your application, you can do so by setting the
--webserver_port flag for the
yb-tserver process as shown here. In order to keep this consistent you can add it to your configuration file likewise.
Configuring Helm Chart Values for YugabyteDB Deployment
When using Helm charts to deploy YugabyteDB on Kubernetes there are default values set for the Helm chart that can be found in the
helm/yugabyte/values.yaml file. The default values within this file are as noted in the prerequisites section of the “Deploy a YugabyteDB cluster on OSS Kubernetes using Helm Chart” documentation. Instead of using the default values in the Helm chart, you can also modify the configuration of the YugabyteDB cluster according to your specific requirements. An example of such a modification can be found below:
helm upgrade --set resource.tserver.requests.cpu=8,resource.tserver.requests.memory=15Gi yb-demo ./yugabyte
Also, see the
values.yaml file within our GitHub page as a reference of (most) of the configurable Helm chart values.
New Documentation, Blogs, Tutorials, and Videos
- Creating a Centralized Consent Database for 80M Citizens Across 100k+ Companies
- INSERT INTO Yugabyte (We’re Hiring, April Is Diversity Month Edition)
- An Adventure in Multi-Cloud with Kubernetes and Distributed Data
- Fortune 500 Financial Services Company Chooses YugabyteDB
- Getting Started with DBeaver EE and Yugabyte Cloud Query Language (YCQL)
- Recap of Distributed SQL Summit Asia 2021
- YugabyteDB Fundamentals Community Training & Certification
- How to Run Distributed SQL Databases on Kubernetes
- April 22, 2021 – YugabyteDB YSQL Development Intermediate Certification and Training
- April 28, 2021 – How to Implement Distributed ACID Transactions Without Atomic Clocks
- May 4 – May 7, 2021 – KubeCon + CloudNativeCon Europe 2021 Virtual
- May 27, 2021 – YugabyteDB DBA Fundamentals Training and Certification
- June 30, 2021 – High Performance Design & Architecture in a Distributed SQL Database
New and Updated Docs
We’re continually adding to and updating the documentation to give you the information you need to make the most out of YugabyteDB. We had so many new and updated docs for you, that we wrote a blog post this month to cover recent content added, and changes made, to the YugabyteDB documentation – What’s New and Improved in YugabyteDB Docs, Mar 2021.
Ready to start exploring YugabyteDB features? Getting up and running locally on your laptop is fast. Visit our quickstart page to get started.