How to Create a Conditional WHERE Clause in PostgreSQL

PostgreSQL Tips & Tricks

June 26, 2023

Understanding the conditional WHERE clause in PostgreSQL is crucial when dealing with data that requires dynamic filtering, while keeping a single static SQL statement. You don’t want to write a prepared SQL statement for every use case scenario, but having the application generate a dynamic query (on the fly) is hard to maintain. Therefore, a conditional WHERE clause that depends on some user input (parameters) is ideal—say, when the user selects a specific country, you can show only rows for that country, but return all rows if the input parameter is empty.

An Example of the WHERE Clause in Action

As an example, I will be using the “Cost of Living Index 2022″ dataset.

I created a demo table to store one row per country and per consumer goods.

create table demo (
 primary key (country, consumer_goods)
 , country text
 , consumer_goods text
 , index numeric(8,2)
);

I used awk to unpivot the CSV file and load it into the table.

awk -F, '

# remove ending ^M if any
     { sub(/\r$/,"") }

# get names from header (start at columns 3 to Number of Fields)
NR==1{ for( i=3 ; i<=NF ; i++ ) dim[i]=gensub(" Index$","",1,$i) }

# print country (column 2), price (value at column 3 to NF), and name
NR>1 { for( i=3 ; i<=NF ; i++ ) print $2"\t"$i"\t"dim[i] }

' Cost_of_Living_Index_2022.csv | 
 tee /dev/stderr | 
 psql -ec '

-- load CSV into the table
copy demo(country, index, consumer_goods ) from stdin
'

I queried the table for all countries…

yugabyte=# select country, consumer_goods, index
           from demo
           order by index desc;

    country    |      consumer_goods      | index
---------------+--------------------------+--------
 Bermuda       | Restaurant Price         | 159.17
 Bermuda       | Groceries                | 148.66
 Bermuda       | Cost of Living           | 146.04
 Switzerland   | Groceries                | 128.13
 Bermuda       | Cost of Living Plus Rent | 123.80
 Switzerland   | Cost of Living           | 123.35
 Switzerland   | Restaurant Price         | 122.09
 Switzerland   | Local Purchasing Power   | 118.44
 United States | Local Purchasing Power   | 106.34
 Norway        | Restaurant Price         | 105.49
...
(834 rows)

…or for one country.

yugabyte=# select country, consumer_goods, index
           from demo
           where country='Switzerland'
           order by index desc;

   country   |      consumer_goods      | index
-------------+--------------------------+--------
 Switzerland | Groceries                | 128.13
 Switzerland | Cost of Living           | 123.35
 Switzerland | Restaurant Price         | 122.09
 Switzerland | Local Purchasing Power   | 118.44
 Switzerland | Cost of Living Plus Rent |  90.62
 Switzerland | Rent                     |  53.54
(6 rows)

*NOTE: YugabyteDB, a PostgreSQL-compatible database, is being used for illustration purposes.

The first query obviously does a Seq Scan. It had to read all rows due to an absence of a filter. The second query can use an Index Scan, because it has a filter.

yugabyte=# explain (analyze, costs off)
           select country, consumer_goods, index
           from demo
           where country='Switzerland'
           order by index desc;

                                     QUERY PLAN
------------------------------------------------------------------------------------
 Sort (actual time=0.823..0.823 rows=6 loops=1)
   Sort Key: index DESC
   Sort Method: quicksort  Memory: 25kB
   ->  Index Scan using demo_pkey on demo (actual time=0.792..0.799 rows=6 loops=1)
         Index Cond: (country = 'Switzerland'::text)
 Planning Time: 0.067 ms
 Execution Time: 0.849 ms
 Peak Memory Usage: 34 kB
(8 rows)

To Simplify Use One statement for code reuse

To simplify my application, I can write one query that covers both cases. It will use the country as an input parameter. If no country is specified, then it will read all countries. This is typically done with a prepared statement on the application side or a stored procedure on the database side. For the demo, I’ll use prepare and execute from plsql. Here is an example:

yugabyte=# prepare query1(text) as
           select country, consumer_goods, index
           from demo
           where ($1 is null) or (country=$1)
           order by index desc;

If no country is passed as input parameter $1, the condition ($1 is null) will be true for all rows. If a country is passed, the condition ($1 is null) is not true and the where clause will be true only when (country=$1). This logical disjunction can also be read as ‘if ($1 is not null) then (country=$1)‘ according to the law of implication (the logical equivalence of ‘if A then B’ is ‘not A or B’).

Let’s look at the execution plans:

yugabyte=# explain (analyze, costs off)
           execute query1(null);

                             QUERY PLAN
--------------------------------------------------------------------
 Sort (actual time=3.678..3.725 rows=834 loops=1)
   Sort Key: index DESC
   Sort Method: quicksort  Memory: 90kB
   ->  Seq Scan on demo (actual time=1.402..3.375 rows=834 loops=1)
 Planning Time: 0.129 ms
 Execution Time: 3.794 ms
 Peak Memory Usage: 184 kB
(7 rows)

yugabyte=# explain (analyze, costs off)
           execute query1('Switzerland');

                                     QUERY PLAN
------------------------------------------------------------------------------------
 Sort (actual time=0.798..0.798 rows=6 loops=1)
   Sort Key: index DESC
   Sort Method: quicksort  Memory: 25kB
   ->  Index Scan using demo_pkey on demo (actual time=0.779..0.786 rows=6 loops=1)
         Index Cond: (country = 'Switzerland'::text)
 Planning Time: 0.103 ms
 Execution Time: 0.822 ms
 Peak Memory Usage: 34 kB
(8 rows)

So everything looks good. Seq Scan is used when we need to read all rows (rows=834) and Index Scan is used to filter for a specific country (rows=6).

Now let’s say that I have many executions of the first scenario. I’m using \watch to run it several times (count is a nice addition in PostgreSQL version 16, currently in beta):

yugabyte=# explain (analyze, costs off)
           execute query1(null)
           \watch 0.1 count=100

                             QUERY PLAN
--------------------------------------------------------------------
 Sort (actual time=3.781..3.827 rows=834 loops=1)
   Sort Key: index DESC
   Sort Method: quicksort  Memory: 90kB
   ->  Seq Scan on demo (actual time=1.461..3.453 rows=834 loops=1)
 Planning Time: 0.094 ms
 Execution Time: 3.898 ms
 Peak Memory Usage: 168 kB
(7 rows)


...

                             QUERY PLAN
--------------------------------------------------------------------
 Sort (actual time=2.291..2.327 rows=834 loops=1)
   Sort Key: index DESC
   Sort Method: quicksort  Memory: 90kB
   ->  Seq Scan on demo (actual time=1.555..2.001 rows=834 loops=1)
         Remote Filter: (($1 IS NULL) OR (country = $1))
 Planning Time: 0.018 ms
 Execution Time: 2.407 ms
 Peak Memory Usage: 216 kB
(8 rows)

At some point, the execution plan switched to a generic plan. So now you see the original condition with parameters, which saves the planning phase for the next executions.

However, there is a problem with the generic plan. There’s only one execution plan to cover both scenarios, so calling my query for a specific country will not initiate an Index Scan anymore:

yugabyte=# explain (analyze, costs off)
           execute query1('Switzerland');

                            QUERY PLAN
------------------------------------------------------------------
 Sort (actual time=2.145..2.145 rows=6 loops=1)
   Sort Key: index DESC
   Sort Method: quicksort  Memory: 25kB
   ->  Seq Scan on demo (actual time=2.126..2.132 rows=6 loops=1)
         Remote Filter: (($1 IS NULL) OR (country = $1))
 Planning Time: 0.014 ms
 Execution Time: 2.191 ms
 Peak Memory Usage: 40 kB
(8 rows)

This is how PostgreSQL works. It’s done on purpose to avoid too many parse phases when a prepared statement is executed many times. This is typical for OLTP applications. Since the release of PostgreSQL 12, you can control that with plan_cache_mode, but there is a better solution that provides the best performance for custom and generic plans.

Better Performance with OR expansion with UNION ALL

The “is null or equal” predicate is nice but may not be easy to read by non-mathematicians. I can expand it to two queries, one for each case, depending on the parameter being set or not, concatenated by a UNION ALL:

yugabyte=# prepare query2(text) as
 select country, consumer_goods, index
   from demo
   -- filter on a specific country defined in the parameter
   where ($1 is not null) and ($1 = country)
  union all
  select country, consumer_goods, index
   from demo
   -- get all countries when the parameter is undefined
   where ($1 is     null) and ($1 = country) is not true
  order by index desc;

Each branch condition starts with the criteria on the parameter ($1 being null or not). When not null, an additional condition on it will filter on its value.

I’ve added the opposite condition, along with the ‘is not true‘ predicate, in the other branch. This is not needed here because it is implicit: if the parameter is null, the equality is never true. However, this is a good practice when expanding OR to UNION ALL in general to be sure that there’s no rows in double from both branches.

With custom plans, where the planning is done for each execution, with the knowledge of the parameter values, the access path is optimized for each scenario, resulting to a Seq Scan for all rows, or an Index Scan for a specific value:

yugabyte=# explain (analyze, costs off)
           execute query2(null);

                                QUERY PLAN
--------------------------------------------------------------------------
 Sort (actual time=3.798..3.840 rows=834 loops=1)
   Sort Key: demo.index DESC
   Sort Method: quicksort  Memory: 90kB
   ->  Append (actual time=1.485..3.510 rows=834 loops=1)
         ->  Seq Scan on demo (actual time=1.484..3.435 rows=834 loops=1)
 Planning Time: 0.152 ms
 Execution Time: 3.909 ms
 Peak Memory Usage: 168 kB
(8 rows)

yugabyte=# explain (analyze, costs off)
           execute query2('Switzerland');

                                        QUERY PLAN
------------------------------------------------------------------------------------------
 Sort (actual time=1.183..1.184 rows=6 loops=1)
   Sort Key: demo.index DESC
   Sort Method: quicksort  Memory: 25kB
   ->  Append (actual time=1.164..1.172 rows=6 loops=1)
         ->  Index Scan using demo_pkey on demo (actual time=1.163..1.170 rows=6 loops=1)
               Index Cond: ('Switzerland'::text = country)
 Planning Time: 0.187 ms
 Execution Time: 1.215 ms
 Peak Memory Usage: 34 kB
(9 rows)

Since PostgreSQL 12 we can also force a generic plan to validate that it is optimal for both scenarios and I prefer to use that to be sure that the plan is optimal in all cases, rather than repeating the query many times:

postgres=# set plan_cache_mode=force_generic_plan;
SET

postgres=# explain (analyze, costs off)
           execute query2(null)
;
                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Sort (actual time=0.488..0.527 rows=834 loops=1)
   Sort Key: demo.index DESC
   Sort Method: quicksort  Memory: 86kB
   ->  Append (actual time=0.013..0.242 rows=834 loops=1)
         ->  Result (actual time=0.002..0.002 rows=0 loops=1)
               One-Time Filter: ($1 IS NOT NULL)
               ->  Index Scan using demo_pkey on demo (never executed)
                     Index Cond: (country = $1)
         ->  Result (actual time=0.011..0.182 rows=834 loops=1)
               One-Time Filter: ($1 IS NULL)
               ->  Seq Scan on demo demo_1 (actual time=0.010..0.097 rows=834 loops=1)
                     Filter: (($1 = country) IS NOT TRUE)
 Planning Time: 0.128 ms
 Execution Time: 0.578 ms
(14 rows)

postgres=# explain (analyze, costs off)
           execute query2('Switzerland')
;

                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Sort (actual time=0.034..0.035 rows=6 loops=1)
   Sort Key: demo.index DESC
   Sort Method: quicksort  Memory: 25kB
   ->  Append (actual time=0.022..0.025 rows=6 loops=1)
         ->  Result (actual time=0.021..0.024 rows=6 loops=1)
               One-Time Filter: ($1 IS NOT NULL)
               ->  Index Scan using demo_pkey on demo (actual time=0.019..0.021 rows=6 loops=1)
                     Index Cond: (country = $1)
         ->  Result (actual time=0.000..0.000 rows=0 loops=1)
               One-Time Filter: ($1 IS NULL)
               ->  Seq Scan on demo demo_1 (never executed)
                     Filter: (($1 = country) IS NOT TRUE)
 Planning Time: 0.016 ms
 Execution Time: 0.063 ms
(14 rows)

The generic plan has two branches, but only one is executed thanks to the One-Time Filter. Depending on this filter, the parameter being set or not, one branch is executed, and the other not, visible as ‘(never executed)‘. Compared to custom plans, this choice is done at runtime, which may have a little CPU overhead, but it saves the whole planning time.

Note that in PostgreSQL 16 you will be able to check the generic plan with its two branches and One-Time filters with EXPLAIN on the parameterized query:

postgres=# explain (generic_plan, costs off)
 select country, consumer_goods, index
   from demo
   -- filter on a specific country
   where ($1::text is not null) and ($1::text = country)
 union all
 select country, consumer_goods, index
   from demo
   -- get all countries
   where ($1::text is     null) and ($1::text = country) is not true
 order by index desc;

                        QUERY PLAN
----------------------------------------------------------
 Sort
   Sort Key: demo.index DESC
   ->  Append
         ->  Result
               One-Time Filter: ($1 IS NOT NULL)
               ->  Index Scan using demo_pkey on demo
                     Index Cond: (country = $1)
         ->  Result
               One-Time Filter: ($1 IS NULL)
               ->  Seq Scan on demo demo_1
                     Filter: (($1 = country) IS NOT TRUE)
(11 rows)

I had to add the parameter data types ($1::text) in all places because the execution plan depends on it and without explicit casting they are known only at runtime.

A Word About YugabyteDB and Its PostgreSQL Behavior

This blog post was inspired by a question on the PostgreSQL Community Slack. As noted above, the execution plans were run on YugabyteDB, which behaves the same as PostgreSQL since it reuses the same code for the SQL processing layer. The only difference is that PostgreSQL may show a Bitmap Heap Scan. Because YugabyteDB stores the table in its primary key, rather than in a heap table and a separate primary key index, there’s no additional reads and an Index Scan returns all columns immediately as if it was an Index Only Scan.

To Summarize…

The best approach to get the optimal execution plan for each scenario is to build specific queries for each. Dynamic SQL, where the query is constructed by concatenating strings at runtime, can be a solution if the query depends on user input. However, this method is not easy to maintain, since searching for the query code monitored in tools like pg_stat_activity or pg_stat_statements is not possible, and there is a risk of SQL injection if parameters are not used when concatenating the query parts.

One alternative is to do some math and use a complex OR condition that mimics a conditional statement. While effective, this approach may be difficult for future developers to read and optimize for the query planner. Alternatively, when dealing with a limited number of scenarios, concatenating different queries with UNION ALL offers the best performance.

June 26, 2023

PostgreSQL Tips and Tricks Query

How to Create a Conditional WHERE Clause in PostgreSQL

Related Posts

Explore Distributed SQL and YugabyteDB in Depth