Performance of partitioning using triggers and rules
In PostgreSQL there are two different ways to perform partitioning - using rules and using triggers. One of the usual reasons why a partitioningu is used, is a performance, so lets analyze differences between these two ways regarding performance.
Three different partitioning methods are analyzed in this brief benchmark:
- trigger with dynamic SQL queries (EXECUTE)
- trigger without dynamic SQL queries
- rules
The whole test is quite simple - a table with a given number of partitions is created, and 500.000 rows is inserted into it. Number of partitions is varied so that it is possible to observe how the number of partitions influences the performance. A total and a CPU utilization is measured.
Database structure
The partitioned table has a very simple structure - a single column used for partitioning.
CREATE TABLE table_base (
id INTEGER PRIMARY KEY
);
For this table a desired number of partitions is created, splitting the interval [1,500000] into equally sized pieces. A partition may be created like this:
CREATE TABLE table_100000 (
PRIMARY KEY (id),
CHECK (id > 0 AND id <= 100000)
) INHERITS (table_base);
After creating all the partitions, triggers or rules performing the partitioning are created. This step will be described in the following sections.
Trigger with dynamic SQL queries
The first partitioning method is a trigger using dynamic SQL.
CREATE OR REPLACE FUNCTION partition_trigger() RETURNS trigger AS $$ DECLARE v_size INTEGER := 100000; v_id INTEGER; BEGIN v_id := ((NEW.id - 1) / v_size + 1) * v_size; EXECUTE 'INSERT INTO table_' || v_id || ' VALUES ($1)' USING NEW.id; RETURN NULL; END; $$ LANGUAGE plpgsql;
This trigger uses information about partition size to determine a partition the row should be redirected to, and then executes a dynamic INSERT statement.
| number of partitions | total time (s) | CPU utilization (%) |
| 5 | 77 | 97 |
| 10 | 77 | 98 |
| 20 | 78 | 98 |
As you can see the performance does not depend on the number of partitions and the CPU utilization is quite high. A script performing this benchmark may be downloaded here.
Trigger without dynamic SQL queries
The second option is to use a trigger without dynamic SQL, using an explicit INSERT for each partition. An example of such trigger is this:
CREATE OR REPLACE FUNCTION partition_trigger() RETURNS trigger AS $$
BEGIN
IF (NEW.id > 0 AND NEW.id <= 100000) THEN
INSERT INTO table_100000 VALUES (NEW.*);
ELSIF (NEW.id > 100000 AND NEW.id <= 200000) THEN
INSERT INTO table_200000 VALUES (NEW.*);
ELSIF (NEW.id > 200000 AND NEW.id <= 300000) THEN
INSERT INTO table_300000 VALUES (NEW.*);
ELSIF (NEW.id > 300000 AND NEW.id <= 400000) THEN
INSERT INTO table_400000 VALUES (NEW.*);
ELSIF (NEW.id > 400000 AND NEW.id <= 500000) THEN
INSERT INTO table_500000 VALUES (NEW.*);
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
This script does not execute any dynamic SQL commands, which should improve the performance. So lets see the results.
| number of partitions | total time (s) | CPU utilization (%) |
| 5 | 53 | 97 |
| 10 | 57 | 98 |
| 20 | 67 | 98 |
Obviously, this time the performance decreases as the number of partitions grows - that is caused by the growing number of IF conditions in the trigger body. A script for this benchmark may be downloaded here.
Rules
The last option is to perform partitioning using rules, which are based on plan rewriting (compared to procedural triggers). For each partition a separate rule is created using a command simillar to this:
CREATE OR REPLACE RULE rule_100000 AS ON INSERT TO table_base
WHERE (id > 0 AND id <= 100000) DO INSTEAD
INSERT INTO table_100000 VALUES (NEW.*);
This script does not have to execute any procedures (in contrast to trigger-based methods), but an execution plan is rewritten when planning the execution.
| number of partitions | total time (s) | CPU utilization (%) |
| 5 | 15 | 83 |
| 10 | 22 | 90 |
| 20 | 34 | 90 |
As you can see, the performance decreases as the number of partitions grows, just as in the previous case. This is caused by a growing number of rules as all of the rules have to be checked for each row. A script for this benchmark may be downloaded here.
Summary
Obviously the fastest solution is the one using rules - it's about 2x faster than the trigger-based solutions. And not only it's much faster, the CPU utilization is considerably lower. But there are several problems related to rules, e.g. with execution plan invalidation in case of adding a new partition, and so on.
The benchmarks were performed on PostgreSQL 8.5alpha2.



