Is My PostgreSQL Database Experiencing CPU, RAM, or I/O Pressure?

PostgreSQL Tips and Tricks Series

July 10, 2023

Linux metrics such as CPU Usage, Load Average, Free Memory, and I/O Wait indicate system activity. However, they don’t directly indicate if your workload is limited by CPU, RAM, or I/O. To identify the specific resource that’s slowing down your workload, Linux offers the perfect metric: Pressure Stall Information (PSI). This is essential in cloud infrastructure for determining what to scale to maximize throughput.

This blog post demonstrates how to query Linux information using PostgreSQL with the file_fdw extension. This can assist DBAs in seeing what’s happening in the system. Once you realize the value of PSI, you may want to incorporate it into your system monitoring (it’s available in prometheus node_exporter). However, accessing it through SQL can still be beneficial for developers.

Enabling Pressure Stall Information (PSI)

The modern Linux kernel supports PSI, but it may not be enabled in your system. To be sure, check that the kernel is higher than version 4.20.

$ uname --kernel-release
5.4.17-2136.319.1.4.el8uek.aarch64

Then verify that the kernel was compiled with PSI.

$ grep PSI /boot/config-$(uname --kernel-release)
CONFIG_PSI=y
CONFIG_PSI_DEFAULT_DISABLED=y

‘CONFIG_PSI=y‘ means that it is compiled but ‘CONFIG_PSI_DEFAULT_DISABLED=y‘ tells you that it’s not enabled by default. It must be enabled from the Linux boot command line, which you can check using:

$ grep psi /proc/cmdline
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.4.17-2136.309.5.el8uek.aarch64 root=/dev/mapper/ocivolume-root ro ... psi=1

Once it is enabled, you have access to PSI from the /proc filesystem:

$ tail /proc/pressure/*
==> /proc/pressure/cpu <==
some avg10=0.00 avg60=0.00 avg300=0.00 total=103722350558

==> /proc/pressure/io <==
some avg10=0.00 avg60=0.00 avg300=0.00 total=3873857799
full avg10=0.00 avg60=0.00 avg300=0.00 total=900986146

==> /proc/pressure/memory <==
some avg10=0.00 avg60=0.00 avg300=0.00 total=149333294
full avg10=0.00 avg60=0.00 avg300=0.00 total=2363060

If it is not enabled, you will get “Operation not supported” and will need to add “psi=1” to the boot command line. The procedure to add a kernel flag depends on your distribution. For example, you may add it at the end of the GRUB_CMDLINE_LINUX value in /etc/default/grub, run grub2-mkconfig, and reboot. Or use grubby (grubby --update-kernel=ALL --args='psi=1') or tuned profiles (see Pressure Stall Information on Autonomous Linux for an example).

You can read and monitor those three files in /proc/pressure from the Linux command line. Monitoring tools (like Prometheus Node Exporter) read from these files.

To expose this information to the developers, it can be convenient to query from the SQL prompt. This can be set up with the file_fdw extension, which supports the reading of files from a SQL query.

Reading PSI from PostgreSQL Foreign Data Wrapper

To query the Pressure Stall Information from SQL, you can add a simple view using the File Foreign Data Wrapper. This extension allows you to read files from the filesystem and expose them as a foreign table, similar to the functionality of COPY. Please note that only a superuser can create this view, but afterward, you can grant select access to the foreign table to any user. The File Foreign Data Wrapper is included in the PostgreSQL contrib modules and can be easily installed.

yugabyte=# create extension file_fdw;
CREATE EXTENSION

yugabyte=# create server psi foreign data wrapper file_fdw;
CREATE SERVER

I declare the Foreign Data Wrapper table that will expose the information from PSI. This declaration has two parts—the table description and how to read the file. Here I’m using PROGRAM to format it in the Linux shell:

create foreign table psi (
 resource text,  -- IO, CPU or Memory tasks are waiting on
 tasks text,     -- some tasks or full stall
 avg10 float,    -- percentage of the past 10 seconds
 avg60 float,    -- percentage of the past 1 minute
 avg300 float,   -- percentage of the past 5 minutes
 total bigint,   -- cumulative time in us
 uptime float,   -- uptime in us
 host text
) server psi options ( PROGRAM
$SH$

# read the files and format it as space separated

awk '

!/ /{host=$0} # from /proc/sys/kernel/hostname

/^[0-9.]+ [0-9.]+/{uptime=$1*1e6}  # from /proc/uptime

$0 ~ re { # matches the regexp of PSI output
 $0=FILENAME" "$0" "uptime" "host # add host info to the line
 gsub("(^.*/|[^ =]+=)","") # keeps only the value after =
 print # prints the values in the order of the table columns
 }

' re='(full|some) avg10=([0-9.]+) avg60=([0-9.]+) avg300=([0-9.]+) total=([0-9]+)' \
/proc/uptime /proc/sys/kernel/hostname /proc/pressure/{io,cpu,memory}
$SH$ , format 'text', delimiter ' '
);

Awk is used to format the output since it should be present on any Linux system. This runs on the PostgreSQL (or YugabyteDB) server and can be created only by the superuser, who can then grant select access to any user.

Example Walk-Through Using YugabyteDB

Let’s take an example using YugabyteDB, which is a PostgreSQL-compatible database.

yugabyte=#

select resource, tasks, avg10, avg60, avg300
 ,round(100*total/uptime::numeric,2) total, host
 from psi order by 2,1;

 resource | tasks | avg10 | avg60 | avg300 | total | host
----------+-------+-------+-------+--------+-------+------
 io       | full  |     0 |     0 |   0.03 |  0.03 | yba
 memory   | full  |     0 |     0 |      0 |  0.00 | yba
 cpu      | some  | 89.08 | 64.09 |  28.74 |  0.30 | yba
 io       | some  |  1.99 |  1.42 |   0.58 |  0.03 | yba
 memory   | some  |     0 |     0 |      0 |  0.00 | yba

(5 rows)

The metric we will focus on in this example is the percentage of time that the workload slows down due to resource starvation.

The middle line (i.e. CPU) shows that, on average, during the past 10 seconds (‘avg10‘), at least one process (‘some‘) was waiting for compute (‘cpu‘), meaning it was idle in the run queue because the CPU it was waiting for was busy with another process. The longer averages (‘avg60‘, ‘avg300‘) show that this wait increased during the past 5 minutes.

For other resources, in addition to the time where at least one process (‘some‘ pressure) was waiting, we see the time during which all processes (‘full’ pressure) were starving on disk reads or writes (‘io‘) or RAM (‘memory‘ allocation). This is not possible for the CPU because, unless the power is down, at least one process can run on each CPU. The term “full” waiting time refers to all processes waiting for a non-CPU resource (disk I/O or memory allocation). This resource is the bottleneck, and all CPU cycles are wasted during that time. Scaling up the CPU will not improve in terms of performance when registering “full” pressure.

In the example above, the system is CPU-bound, and scaling the CPU can improve the throughput. Adding more memory will not help. Faster I/O will not have a significant impact.

On a system with 4 vCPUs, the CPU pressure will be low if there are 1 to 4 active PostgreSQL sessions utilizing the CPU. This is because there will always be a vCPU available whenever a process needs to run. However, as the number of active sessions increases, the CPU pressure also increases. For example, with 5 active sessions, the pressure is at 25%; with 6, it rises to 50%; with 7, it reaches 75%; and with 8 or more, it hits 100%. In the latter case, this means that during 100% of the CPU time (i.e. the elapsed time multiplied by the number of vCPUs), at least one process is waiting for CPU resources. To improve throughput, adding vCPUs can be beneficial.

When it comes to ‘io‘ and ‘memory‘, the percentage is related to the elapsed time. If certain PostgreSQL sessions are waiting for I/O operations (such as reads or writes from files) while others are actively using the CPU, the I/O pressure will increase the ‘some‘ percentage of ‘io‘, proportional to the duration in which those sessions are waiting. However, if all sessions are waiting on I/O to the extent that no process is making progress on the CPU, the ‘full‘ pressure will increase. This indicates a global I/O bottleneck.

How to Utilize the Data

One interesting way to utilize this information is when configuring the connection pool. It is not advisable to add more connections if the ‘cpu‘ is under significant pressure for a prolonged period (for example, if it reaches 50% on ‘avg60’). Additional connections would likely just wait in the run queue, increasing pressure on the CPU scheduler. Similarly, if there is ‘full‘ pressure on ‘io‘ or ‘memory‘, adding more connections would likely result in those connections waiting for the same limited resources. In both cases, it is preferable to queue the calls in the application, or the connection pool, before initiating a transaction rather than waiting on the database server and potentially blocking other processes with acquired locks.

To Summarize

Pressure Stall Information, once enabled, is easy to query and interpret to determine where the pressure is on the database system: CPU, I/O or RAM. High percentages with ‘some‘ pressure shows where the bottlenecks are. The percentage of ‘full‘ pressure is a critical situation; the system doesn’t make progress while the CPU is idle. Please note that those values are provided by the Linux kernel, and the scope is the host (bare metal server, virtual machine, compute instance, Kubernetes worker). It can show the metrics at the scope of the container when using Control Groups Version 2.

A Word on Reading Those Metrics in YugabyteDB

The method described here for reading those metrics from the database works similarly for both PostgreSQL and YugabyteDB. However, since YugabyteDB is distributed across multiple nodes, it may be necessary to monitor all nodes. The provided query examples display the pressure on the specific node you are currently connected to.

July 10, 2023

Database Performance IO issues PostgreSQL Tips and Tricks

Is My PostgreSQL Database Experiencing CPU, RAM, or I/O Pressure?

Related Posts

Explore Distributed SQL and YugabyteDB in Depth