Distributed SQL Tips and Tricks – March 13, 2020

Dorian Hoxha

Welcome to this week’s tips and tricks blog where we recap some distributed SQL questions from around the Internet. We’ll also review upcoming events, new documentation, and blogs that have been published since the last post. Got questions? Make sure to ask them on our YugabyteDB Slack channel, Forum, GitHub, or Stackoverflow. Ok, let’s dive right in.

How can I UPSERT multiple rows with an update?

PostgreSQL and YSQL enable you to do upserts using the INSERT ON CONFLICT clause. Similar to multi-row inserts, you can also batch multiple upserts in a single INSERT ON CONFLICT statement for better performance.

In case the row already exists, you can access the existing values using EXCLUDED.<column_name> in the query.

In the example below we have a table with where we are keeping counters of different products and incrementing rows in batches:

CREATE TABLE products
  (
     NAME     TEXT PRIMARY KEY,
     quantity BIGINT DEFAULT 0
  );  

---

INSERT INTO products(name, quantity) 
VALUES 
  ('apples', 1), 
  ('oranges', 5) ON CONFLICT(name) DO UPDATE 
SET 
  quantity = products.quantity + excluded.quantity;

---

INSERT INTO products(name, quantity) 
VALUES 
  ('apples', 1), 
  ('oranges', 5) ON CONFLICT(name) DO UPDATE 
SET 
  quantity = products.quantity + excluded.quantity;

---

SELECT *
FROM   products;

  name   | quantity
---------+----------
 oranges |   	10
 apples  |    2

(2 rows)

How can I optimize YugabyteDB for faster integration testing

There are several steps that we can take to make YugabyteDB faster when testing in integration tests, CI/CD, and other testing scenarios:

  • Point gflag --fs_data_dirs to a ramdisk directory
    This will make DML, DDL and create/destroy of a cluster faster because data is not written to disk.
  • Set gflag --yb_num_shards_per_tserver=1
    Reducing the number of shards lowers overhead when creating/dropping YCQL tables.
  • Set gflag --ysql_num_shards_per_tserver here=1
    Reducing the number of shards lowers overhead when creating/dropping YSQL tables.
  • Set gflag --replication_factor=1
    Keeping only the tablet leaders will remove replication overhead.
  • Use TRUNCATE instead of DROP TABLE followed by CREATE TABLE between test cases if you want to keep the same table schema but delete all the rows in it.

These options are all explained in the yb-tserver and yb-master configuration pages.

What is the YSQL equivalent to in SQL Server’s ISNULL()?

The expression ISNULL(expression, replacement) in MS SQL Server returns the replacement value if the expression returns `NULL`.

The same functionality can be achieved with the coalesce(arg1,arg2,argn) function. The function takes an arbitrary number of arguments and returns the first value that `IS NOT NULL`. Example in SQL Server:

SELECT Isnull(my_column, 'Missing')
FROM   table;

The same functionality in YSQL:

SELECT COALESCE(my_column, ‘missing’)
FROM   table;

How can I find duplicate rows using YSQL?

In many cases we want to find duplicate rows in a table to clean them up or process them. This can be easily achieved with the GROUP BY and HAVING clauses.

Suppose we have a table of employees with the following data:

CREATE TABLE employees
  (
     id         BIGSERIAL,
     first_name TEXT,
     last_name  TEXT
  );  

---

INSERT INTO employees
            (first_name,
             last_name)
VALUES      ('John',
             'Smith'),
            ('Bill',
             'Potter'),
            ('John',
             'Smith');  

You can find duplicate rows using:

SELECT first_name,
       last_name
FROM   employees
GROUP  BY first_name,
          last_name
HAVING Count(*) > 1;  

 first_name | last_name
------------+-----------
 John   	| Smith

(1 row)

You can also remove duplicate rows using DISTINCT ON:

DELETE FROM employees
WHERE  id NOT IN (SELECT DISTINCT ON(first_name, last_name) id
                  FROM   employees);  

SELECT *
FROM   employees;  

 id | first_name | last_name
----+------------+-----------
  1 | John   	| Smith
  2 | Bill   	| Potter

(2 rows)

DISTINCT ON(expression [, ...]) selects only the first row of each set of rows where the given expression (in this case the combination of first_name and last_name) are the same.

New Blogs, Tutorials, and Videos

New Blogs

New Videos

New and Updated Docs

Get Started

Ready to start exploring YugabyteDB features? Getting up and running locally on your laptop is fast. Visit our quickstart page to get started.

High performance distributed SQL by Yugabyte

What’s Next?

  • Compare YugabyteDB in depth to databases like CockroachDB, Google Cloud Spanner and MongoDB.
  • Get started with YugabyteDB on macOS, Linux, Docker and Kubernetes.
  • Contact us to learn more about licensing, pricing or to schedule a technical overview.
Dorian Hoxha

Related Posts

Explore Distributed SQL and YugabyteDB in Depth

Discover the future of data management.
Learn at Yugabyte University
Get Started
Browse Yugabyte Docs
Explore docs
PostgreSQL For Cloud Native World
Read for Free