How to Set Up a Mechanism to Capture pg_stat_statements in a Persistent Table

Kapil Maheshwari

pg_stat_statements provide information for local PostgreSQL instances. Since YugabyteDB is a Postgres-compliant, distributed SQL database every node has its own Postgres installation and thus its own copy of pg_stat_statements.

If the user is connected to the “tserver1” data node, the “pg_stat_statements” view of the “tserver1” node will only capture the information of all the queries executed by the specific user in the session. But that does not need be the case.

This blog walks you through how to capture those pg_stat_statements from all the nodes in a persistent table to use for analysis. Then the data can be stored in YugabyteDB tables, accessible from any node unless purged.

Steps of deployment:

First let’s review the high level steps:

  1. Setup the database objects
  2. Capture the data
  3. Validate the output

Step 1: Setup the database objects:

  1. Create a database “yb_stats_db”
  2. Create a table “pg_stat_statements_hist” in database “yb_stats_db” with the following DDL.I will include all the columns available currently in pg_stat_statements, with three additional columns:
    Column 1: run_id – To uniquely identify the execution which inserted the set of records in the persistent table.
    Column 2: host_ip_addr – Identification of the host from which the pg_stat_statements data was inserted in the table
    Column 3: insert_date – To identify the date when the pg_stat_statements was captured in the persistent table.

    create table pg_stat_statements_hist (
     run_id bigint, -- run_id is unique ID for every instance of script execution
     host_ip_addr text, -- IP address of host to which data belongs
     insert_date date, -- Date when the records are inserted
      userid integer,
      dbid integer,
      queryid bigint,
      query text,
      calls integer,
      total_time float,
      min_time float,
      max_time float,
      mean_time float,
      stddev_time float,
      rows bigint,
      shared_blks_hit float,
      shared_blks_read float,
      shared_blks_dirtied float,
      shared_blks_written float,
      local_blks_hit float,
      local_lks_read float,
      local_blks_dirtied float,
      local_blks_written float,
      temp_blks_read float,
      temp_blks_written float,
      blk_read_time float,
      blk_write_time float
    );
  3. Create a sequence “pg_stat_statements_hist_seq” required for run_id
    create sequence pg_stat_statements_hist_seq start 1 increment 1;

Step 2: Capture the data

  1. Capture the next run_id from the sequence created in the earlier step.
    run_id=$(ysqlsh -d yb_stats_db -Atc "select nextval('pg_stat_statements_hist_seq')")
  2. Capture pg_stat_statements from all the data nodes (running t-server process).
    ysqlsh
    -d yb_stats_db 
    -c "insert into pg_stat_statements_hist  select $run_id,host(inet_server_addr()),current_date ,userid,dbid,queryid,query,calls,total_time,min_time,max_time,mean_time,stddev_time,rows,shared_blks_hit,shared_blks_read,shared_blks_dirtied,shared_blks_written,local_blks_hit,local_blks_read,local_blks_dirtied,local_blks_written,temp_blks_read,temp_blks_written,blk_read_time,blk_write_time from pg_stat_statements"
    

Sample script to automate steps:

Following is a sample script on Github that you can use to automate “Step 2: Capture the data” in persistent tables.

  1. Customize the sample script on all the DB nodes in the “yugabyte” home directory. This sample script assumes that the script will run as “yugabyte” (default install user). Modify the script to your requirements.
    Remember to deploy the script on all the DB nodes (running t-server process).
  2. Provide execute permission to the script
    chmod +x pg_stat_statements_hist.sh
  3. Edit the crontab to schedule the script at a fixed frequency.
    Depending upon system utilization (i.e. number of queries execution), the script run schedule needs to be defined. Note that “pg_stat_statements” stores information only for the latest queries. Old queries get rolled over.

    To schedule every 6 hours in crontab:

    0 0 */6 * * * /home/yugabyte/pg_stat_statements_hist.sh

Step 3: Validate the output

On my three node system, the output is:

yb_stats_db=# select host_ip_addr, insert_date, count(*) from pg_stat_statements_hist group by host_ip_addr, insert_date order by insert_date desc limit 3;
 host_ip_addr | insert_date | count
--------------+-------------+-------
 10.9.89.163  | 2022-10-11  |	39
 10.9.123.173 | 2022-10-11  | 9
 10.9.85.165  | 2022-10-11  |	10
(3 rows)
yb_stats_db=#

Conclusion:

In this blog, we successfully captured the output of “pg_stat_statements” in a “pg_stat_statements_hist” table.

To learn more about performance analysis using pg_stat_statements, watch our recent YugabyteDB Friday Tech Talk.

Prefer reading? Refer to the YugabyteDB Docs site to learn more about using pg_stat_statements in YugabyteDB.

Kapil Maheshwari

Related Posts

Explore Distributed SQL and YugabyteDB in Depth

Discover the future of data management.
Learn at Yugabyte University
Get Started
Browse Yugabyte Docs
Explore docs
PostgreSQL For Cloud Native World
Read for Free