Distributed SQL Tips and Tricks – March 13, 2020
Welcome to this week’s tips and tricks blog where we recap some distributed SQL questions from around the Internet. We’ll also review upcoming events, new documentation, and blogs that have been published since the last post. Got questions? Make sure to ask them on our YugabyteDB Slack channel, Forum, GitHub, or Stackoverflow. Ok, let’s dive right in.
PostgreSQL and YSQL enable you to do upserts using the
INSERT ON CONFLICT clause. Similar to multi-row inserts, you can also batch multiple upserts in a single
INSERT ON CONFLICT statement for better performance.
In case the row already exists, you can access the existing values using
EXCLUDED.<column_name> in the query.
In the example below we have a table with where we are keeping counters of different products and incrementing rows in batches:
CREATE TABLE products ( NAME TEXT PRIMARY KEY, quantity BIGINT DEFAULT 0 ); --- INSERT INTO products(name, quantity) VALUES ('apples', 1), ('oranges', 5) ON CONFLICT(name) DO UPDATE SET quantity = products.quantity + excluded.quantity; --- INSERT INTO products(name, quantity) VALUES ('apples', 1), ('oranges', 5) ON CONFLICT(name) DO UPDATE SET quantity = products.quantity + excluded.quantity; --- SELECT * FROM products; name | quantity ---------+---------- oranges | 10 apples | 2 (2 rows)
There are several steps that we can take to make YugabyteDB faster when testing in integration tests, CI/CD, and other testing scenarios:
- Point gflag
--fs_data_dirsto a ramdisk directory
This will make DML, DDL and create/destroy of a cluster faster because data is not written to disk.
- Set gflag
Reducing the number of shards lowers overhead when creating/dropping YCQL tables.
- Set gflag
Reducing the number of shards lowers overhead when creating/dropping YSQL tables.
- Set gflag
Keeping only the tablet leaders will remove replication overhead.
DROP TABLEfollowed by
CREATE TABLEbetween test cases if you want to keep the same table schema but delete all the rows in it.
ISNULL(expression, replacement) in MS SQL Server returns the replacement value if the expression returns `NULL`.
The same functionality can be achieved with the
coalesce(arg1,arg2,argn) function. The function takes an arbitrary number of arguments and returns the first value that `IS NOT NULL`. Example in SQL Server:
SELECT Isnull(my_column, 'Missing') FROM table;
The same functionality in YSQL:
SELECT COALESCE(my_column, ‘missing’) FROM table;
In many cases we want to find duplicate rows in a table to clean them up or process them. This can be easily achieved with the
GROUP BY and
Suppose we have a table of employees with the following data:
CREATE TABLE employees ( id BIGSERIAL, first_name TEXT, last_name TEXT ); --- INSERT INTO employees (first_name, last_name) VALUES ('John', 'Smith'), ('Bill', 'Potter'), ('John', 'Smith');
You can find duplicate rows using:
SELECT first_name, last_name FROM employees GROUP BY first_name, last_name HAVING Count(*) > 1; first_name | last_name ------------+----------- John | Smith (1 row)
You can also remove duplicate rows using
DELETE FROM employees WHERE id NOT IN (SELECT DISTINCT ON(first_name, last_name) id FROM employees); SELECT * FROM employees; id | first_name | last_name ----+------------+----------- 1 | John | Smith 2 | Bill | Potter (2 rows)
DISTINCT ON(expression [, ...]) selects only the first row of each set of rows where the given expression (in this case the combination of first_name and last_name) are the same.
- Monitoring YugabyteDB with Prometheus and Grafana in Kubernetes
- Getting Started with Falco Runtime Security and Cloud Native Distributed SQL on Google Kubernetes Engine
- Cloud Native Meets Distributed SQL: Bringing Microservices, Kubernetes, Istio & YugabyteDB Together with Hipster Shop Demo
- 5 Query Pushdowns for Distributed SQL and How They Differ from a Traditional RDBMS
- Install YugabyteDB on macOS using Docker and yugabyted
- Longhorn Distributed Block Storage & Cloud Native Distributed SQL on Google Kubernetes Engine
- Getting Started with Distributed SQL Colocated Tables
- Read replica clusters
- JSON data types and functionality: This section includes over 35 new pages, including:
Yugabyte is growing fast and we’d like you to help us keep the momentum going! Check out our currently open positions:
- Lead Development Representative – Sunnyvale, CA
- Site Reliability Engineer (SRE) – Sunnyvale, CA
- Community Success Engineer – Remote OK
- Senior Curriculum Developer – Remote OK
- Frontend Engineer – Sunnyvale, CA
- Software Engineer – Cloud Infrastructure – Sunnyvale, CA
- Software Engineer – Core Database – Sunnyvale, CA
- Software Engineer – Full Stack – Sunnyvale, CA
- Solutions Engineer – Sunnyvale, CA
- Developer Advocate – Sunnyvale, CA
Our team consists of domain experts from leading software companies such as Facebook, Oracle, Nutanix, Google, and LinkedIn. We have come a long way in a short time but we cannot rest on our past accomplishments. We need your ideas and skills to make us better at every function that is necessary to create the next great software company. All while having tons of fun and blazing new trails!
Ready to start exploring YugabyteDB features? Getting up and running locally on your laptop is fast. Visit our quickstart page to get started.