Make the Most of Query Planner Hints in YSQL

Srinivasa Vasu

This blog will explore how to make the most of Query Planner hints in the YugabyteDB database to optimize business queries based on how applications expose them.

YugabyteDB’s YSQL API leverages the pg_hint_plan extension in PostgreSQL to tweak query execution plans. YugabyteDB uses a cost-based optimizer that utilizes table data statistics and some static rules to derive the execution plan for a query.

The planner/optimizer does its best to choose the right execution plan. The results can be sub-optimal at times, for various reasons. In these circumstances, pg_hint_plan makes it possible to tweak execution plans using “hinting” phrases as comments “/*+ IndexScan(table) */ SELECT column(s) FROM table where key=value” in the SQL query.

To understand the relevance of pg_hint_plan, we’ll begin with a use case where the query performance is sub-optimal, and we need to leverage the pg_hint_plan feature to influence the planner’s execution plan.

Imagine that we have data sets from two popular TV shows and need to find the total viewership for a season, episode, etc.

Prerequisites

  • Follow the YugabyteDB Quickstart instructions to run a local YugabyteDB cluster. Test YugabyteDB’s YSQL API to confirm that you have the YSQL service running on “localhost:5433”.
  • You will need JDK 17 or above to test the sample application code. You can use SDKMAN to install the JDK runtime.

Getting Started

The code snippet used in this blog is available in the gist repo. Ensure the “pg_hint_plan.enable_hint” setting is enabled if it is not already enabled by default.

Ensure the “pg_hint_plan.enable_hint” setting is enabled

Define the Tables

Create the following tables for our two popular TV shows. The tables are intentionally defined to explain the usage of pg_hint_plan.

CREATE TABLE hotd (
    region_code bigint NOT NULL,
    season integer,
    episode integer,
    viewers integer,
    PRIMARY KEY(region_code, season, episode)
);

CREATE INDEX idx_season_episode_hotd ON hotd(season, episode);

CREATE TABLE rop (
    region_code bigint NOT NULL,
    season integer,
    episode integer,
    viewers integer,
    PRIMARY KEY(region_code, season, episode)
);

CREATE INDEX idx_season_episode_rop ON rop(season, episode);

Load the Data

Populate the test data into both tables; show#1 & #2 with 0.8 million and 0.4 million records.

INSERT INTO hotd SELECT code, season, episode, floor(random()*100000000) FROM generate_series(1, 10000)code, generate_series(1, 8)season, generate_series(1, 10)episode;

INSERT INTO rop SELECT code, season, episode, floor(random()*100000000) FROM generate_series(1, 5000)code, generate_series(1, 8)season, generate_series(1, 10)episode;

Let’s find the execution plan for the total viewership for a given season and episode across all the matching regions where data exists.

EXPLAIN ANALYZE SELECT SUM(az.viewers), SUM(ds.viewers) FROM rop az JOIN hotd ds ON (az.season=ds.season AND az.episode=ds.episode AND az.region_code=ds.region_code) WHERE az.season=4 and az.episode=1;

total viewership for a given season

The execution time is almost 2 seconds and the planner has chosen “Nested Loop,” which in this case doesn’t seem to be the optimal plan. Though the cost is reasonable, the execution time is high.

Let’s force this with a different hint like “HashJoin” and see if that improves it.

/*+ HashJoin(az ds) */EXPLAIN ANALYZE SELECT SUM(az.viewers), SUM(ds.viewers) FROM rop az JOIN hotd ds ON (az.season=ds.season AND az.episode=ds.episode AND az.region_code=ds.region_code) WHERE az.season=4 and az.episode=1;

Force with a different hint like “HashJoin”

The “HashJoin” is effective here as the execution time is better than the previous run. We influenced the planner using this embedded hint phrase /*+ HashJoin(az ds)*/ to optimize the result.

Let’s find out how we can use this hint phrase without explicitly adding it to the query.

Hint Table

We can use the “hint_plan.hints” table to capture the query-related suggestions the planner should leverage.

Inserting records into the “hints” table is handy if the in-line hint approach involves touching multiple files/places in the application code. Using this feature, we can optimize the suboptimal query performance without touching the application code.

First, enable the “pg_hint_plan.enable_hint_table” setting to use this feature. We can turn it on at the session/role/dB/cluster level.

CREATE EXTENSION pg_hint_plan;

SET pg_hint_plan.enable_hint_table = on;

Next, insert the hint-related information into the “hint_plan.hints” table with whitespaces, newlines, and any other special characters preserved. Let’s record the query the same way a client executes it.

The above setting activates pg_hint_plan to check the hints table for hint phrases to be embedded along with the query.

INSERT INTO hint_plan.hints
(norm_query_string,
 application_name,
 hints)
VALUES
('EXPLAIN ANALYZE SELECT SUM(az.viewers), SUM(ds.viewers) FROM rop az JOIN hotd ds ON (az.season=ds.season AND az.episode=ds.episode AND az.region_code=ds.region_code) WHERE az.season=? and az.episode=?;','','HashJoin(az ds)');

Note that the special “?” placeholder has replaced the query’s static constants. Re-run the query without the embedded hint phrase to find if the planner picks the execution plan from the hints table.

EXPLAIN ANALYZE SELECT SUM(az.viewers), SUM(ds.viewers) FROM rop az JOIN hotd ds ON (az.season=ds.season AND az.episode=ds.episode AND az.region_code=ds.region_code) WHERE az.season=4 and az.episode=1;

EXPLAIN ANALYZE SELECT

In this execution, the planner has picked the right execution plan without specifying the embedded hint by looking into the records from the hints table. This approach is handy if we can’t change/update the application code to use the embedded hint phrases or if the change involves touching multiple files.

Let’s look at how we can leverage this feature from a client application written in a programming language like Java.

Application Code

Let’s go through the various usage patterns in a standalone Java application to keep it simple.

Applications can invoke the query layer in many ways, such as Statement and PreparedStatement API, with and without bind parameters. Based on the usage patterns, let’s figure out how to adjust the hint phrases.

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import java.sql.Statement;

public class Standalone {
  public static void main(String[] args) throws ClassNotFoundException, SQLException {
    Class.forName("com.yugabyte.Driver");
    String query = "SELECT SUM(az.viewers), SUM(ds.viewers) FROM rop az JOIN "
        + "hotd ds ON (az.season=ds.season AND az.episode=ds.episode AND "
        + "az.region_code=ds.region_code) WHERE az.season=4 and az.episode=1";
    try (Connection conn = DriverManager.getConnection("jdbc:yugabytedb://127.0.0.2:5433/yugabyte",
        "yugabyte", "yugabyte");
      Statement stmt = conn.createStatement()) {
      stmt.execute(query);
    }
  }
}

PreparedStatements Output

As expected, the execution time was around 2.3 seconds, as the hint comment phrase for this query doesn’t exist in the hint table. Set “ysql_log_min_duration_statement=0” t-server flag to to print all statement duration in the postgresql.log.

StatementAPI with Static Constants

Let’s insert the influencing hint statement into the hints table.

INSERT INTO hint_plan.hints
(norm_query_string,
 application_name,
 hints)
VALUES
('SELECT SUM(az.viewers), SUM(ds.viewers) FROM rop az JOIN hotd ds ON (az.season=ds.season AND az.episode=ds.episode AND az.region_code=ds.region_code) WHERE az.season=? and az.episode=?','','HashJoin(az ds)');

StatementAPI with static constants

This time, it is optimized, similar to the ysqlsh client execution.

Let’s understand the differences:

  • There is no semicolon(;) required at the end
  • We replaced all static/fixed constants with a question mark (?) placeholder

PreparedStatement API with Bind Parameters

Replace the Statement API with PreparedStatement API having bind parameters.

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;

public class Standalone {
  public static void main(String[] args) throws ClassNotFoundException, SQLException {
    Class.forName("com.yugabyte.Driver");
    String query = "SELECT SUM(az.viewers), SUM(ds.viewers) FROM rop az JOIN "
        + "hotd ds ON (az.season=ds.season AND az.episode=ds.episode AND "
        + "az.region_code=ds.region_code) WHERE az.season=? and az.episode=?";
    try (Connection conn = DriverManager.getConnection("jdbc:yugabytedb://127.0.0.2:5433/yugabyte",
        "yugabyte", "yugabyte");
      PreparedStatement pstmt = conn.prepareStatement(query)) {
      pstmt.setInt(1, 4);
      pstmt.setInt(2, 1);
      pstmt.execute();
    }
  }
}

PreparedStatementAPI Output

Surprisingly, it didn’t use the hints table information, so we are back to square one.

A minor variation applies to the “hints” table when swapping static constants with bind parameters. We need to replace the bind parameters with positional placeholders like $1, $2, etc. Let’s make the change and insert the record into the hint table.

INSERT INTO hint_plan.hints
(norm_query_string,
 application_name,
 hints)
VALUES
('SELECT SUM(az.viewers), SUM(ds.viewers) FROM rop az JOIN hotd ds ON (az.season=ds.season AND az.episode=ds.episode AND az.region_code=ds.region_code) WHERE az.season=$1 and az.episode=$2','','HashJoin(az ds)');

positional placeholders hint

In this execution, the planner has picked the right execution plan based on the inputs from the hint table. Notice the $1, $2, etc., parameter placeholder differences between the previous static constant execution.

PreparedStatement API with Bind Parameter and Static Constant

Let’s use the PreparedStatement API with both bind and static constants. The existing records in the “hints” table won’t help, as we have a mix of dynamic and static inputs.

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;

public class Standalone {
  public static void main(String[] args) throws ClassNotFoundException, SQLException {
    Class.forName("com.yugabyte.Driver");
    String query = "SELECT SUM(az.viewers), SUM(ds.viewers) FROM rop az JOIN "
        + "hotd ds ON (az.season=ds.season AND az.episode=ds.episode AND "
        + "az.region_code=ds.region_code) WHERE az.season=4 and az.episode=?";
    try (Connection conn = DriverManager.getConnection("jdbc:yugabytedb://127.0.0.2:5433/yugabyte",
        "yugabyte", "yugabyte");
      PreparedStatement pstmt = conn.prepareStatement(query)) {
      pstmt.setInt(1, 1);
      pstmt.execute();
    }
  }
}

As we have both bind parameter and static constant, we need to adjust the “hint query” with the appropriate placeholders (?, $). We need to replace the static constants with “?” and bind parameters with “$” placeholders.

INSERT INTO hint_plan.hints
(norm_query_string,
 application_name,
 hints)
VALUES
('SELECT SUM(az.viewers), SUM(ds.viewers) FROM rop az JOIN hotd ds ON (az.season=ds.season AND az.episode=ds.episode AND az.region_code=ds.region_code) WHERE az.season=? and az.episode=$1','','HashJoin(az ds)');

static constants and bind parameters

With Special Characters (New Line)

Let’s test out with special characters, such as a new line in the query.

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;

public class Standalone {
  public static void main(String[] args) throws ClassNotFoundException, SQLException {
    Class.forName("com.yugabyte.Driver");
    String query = "SELECT SUM(az.viewers), SUM(ds.viewers) FROM rop az JOIN" + "\n"
        + "hotd ds ON (az.season=ds.season AND az.episode=ds.episode AND" + "\n"
        + "az.region_code=ds.region_code) WHERE az.season=4 and az.episode=?";
    try (Connection conn = DriverManager.getConnection("jdbc:yugabytedb://127.0.0.2:5433/yugabyte",
        "yugabyte", "yugabyte");
      PreparedStatement pstmt = conn.prepareStatement(query)) {
      pstmt.setInt(1, 1);
      pstmt.execute();
    }
  }
}

The query in the “hints” table should match the application usage pattern to be effective, and notice the new line-related usage of (E’\n’) in the “hint query” below to match the application pattern.

INSERT INTO hint_plan.hints
(norm_query_string,
 application_name,
 hints)
VALUES
('SELECT SUM(az.viewers), SUM(ds.viewers) FROM rop az JOIN'||E'\n'||'hotd ds ON (az.season=ds.season AND az.episode=ds.episode AND'||E'\n'||'az.region_code=ds.region_code) WHERE az.season=? and az.episode=$1','','HashJoin(az ds)');

Conclusion

In this blog, we looked at various ways of leveraging “pg_hint_plan” based on how we write our application queries.

To discover more about query hints in YugabyteDB, you can listen to this recent YugabyteDB Friday Tech Talk—Optimizing Queries By pg_hint_plan PostgreSQL Extension

You can also review our pg_hint_plan documentation to learn more about optimizing YSQL queries using hint plans.

If you have any questions or feedback, please join the vibrant Community Slack and chat with our growing distributed SQL community.

Srinivasa Vasu

Related Posts

Explore Distributed SQL and YugabyteDB in Depth

Discover the future of data management.
Learn at Yugabyte University
Get Started
Browse Yugabyte Docs
Explore docs
PostgreSQL For Cloud Native World
Read for Free