SQL Puzzler : Lets RULE!
Partitioning may be used as a quite elegant solution of a wide variety of problems - distributing the table to multiple drives (if you don't want to use a traditional RAID), effective removal of archive data (without DELETE), etc. It may be automatically realized using triggers or rules. Lets see a puzzler related to partitioning realized with rules.
Consider the following table with this simple structure
CREATE SEQUENCE base_sequence;
CREATE TABLE base_table (
id INTEGER PRIMARY KEY,
value TEXT
);
and say we want to partition it into two other tables, one containing rows with odd values of the "id" column and the other one containing rows with even values. So lets create two partitions
CREATE TABLE odd_table (
PRIMARY KEY (id),
CHECK (id % 2 = 1)
) INHERITS (base_table);
CREATE TABLE even_table (
PRIMARY KEY (id),
CHECK (id % 2 = 0)
) INHERITS (base_table);
and two rules for distributing the rows into appropriate partitions
CREATE RULE even_rule AS ON INSERT TO base_table
WHERE (nextval('base_sequence') % 2 = 0) DO INSTEAD INSERT INTO
even_table (id, value) VALUES (currval('base_sequence'), NEW.value);
CREATE RULE odd_rule AS ON INSERT TO base_table
WHERE (currval('base_sequence') % 2 = 1) DO INSTEAD INSERT INTO
odd_table (id, value) VALUES (currval('base_sequence'), NEW.value);
Those rules qualify for a brief comment:
- PostgreSQL always evaluates all the rules (resp. corresponding WHERE conditions), and does this in the alphabetical order (by the rule_name). That means the "even_rule" rule is evaluated first, then the "odd_rule" rule is evaluated.
- Function "nextval" is used only in the WHERE condition of the "even_rule" rule (i.e. the first rule executed), and the generated value is then referenced using a "currval" function.
So we have a "base_table" table which should remain empty, and the rows will be distributed into two other tables - into "odd_table" and "even_table" tables - according to the value in the "id" column (generated automatically using a sequence). The rows with odd values will be stored into the "odd_table" and rows with even values will be stored in the "evan_table."
We have two quite simple rules that should take care of redirecting the rows into the appropriate partition (tables).
So lets run the following two simple INSERT commands:
INSERT INTO base_table (value) VALUES ('first row');
INSERT INTO base_table (value) VALUES ('second row');
and of course the question is what is the result. Just like in the previous puzzlers, I offer you several possible choices:
- Both rows will be inserted into the base table (base_table), i.e. there won't be any partitioning at all. The table will contain rows [1, 'first row'] and [2, 'second row'].
- Both rows will be inserted into the table "even_table" which will contain rows [2, 'first row'] and [4, 'second row'].
- Both rows will be inserted into the table "odd_table" which will contain rows [1, 'first row'] and [3, 'second row'].
- There will be an exception, and none of the INSERT commands will execute.
I tentokrát je odpověď skryta - zobrazí se až po kliknutí na nadpis "Odpověď".
Odpověď
The correct answer is (b) - both rows will be inserted into the "even_table" table, which will contain the following two rows
2 | first row 4 | second row
and the table "odd_table", that should contain rows with odd "id" values, remains empty. The question is why this happens?
The suspicious thing is that the "id" value is incremented by 2 - there is a single reference to the "nextval" function in the rules, so the value should be incremented by 1, right?
Lets see an execution plan, maybe it will show us what's wrong.
fuzzy-new=# explain insert into base_table(value) values ('treti');
QUERY PLAN
----------------------------------------------------------------------
Result (cost=0.02..0.03 rows=1 width=0)
One-Time Filter:
((((nextval('base_sequence'::regclass) % 2::bigint) = 0) IS NOT TRUE)
AND
(((currval('base_sequence'::regclass) % 2::bigint) = 1) IS NOT TRUE))
Result (cost=0.01..0.02 rows=1 width=0)
One-Time Filter:
((nextval('base_sequence'::regclass) % 2::bigint) = 0)
Result (cost=0.01..0.02 rows=1 width=0)
One-Time Filter:
((currval('base_sequence'::regclass) % 2::bigint) = 1)
(8 rows)
As you can see, there are two expected branches (one for each rule) there is a third branch, created automatically by a PostgreSQL planner. This branch is listed at the first place, inserts rows into the "base" table and the corresponding condition means "if none of the other rule conditions is met." Which in this case means
(nextval('base_sequence') % 2 != 0) AND
(currval('base_sequence') % 2) != 1)
and that's the second execution of the (volatile) function nextval. Thanks to that the value is incremented by 2 and all of the rows are inserted into the "even table" table.
But how to solve this problem?
The first option is to define another unconditional rule, that takes care of the rows that do not meet any of the other conditions. If you're sure the rules cover all possible rows, i.e. there is a matching rule for each INSERT, you may create an empty rule that does "nothing," e.g.
CREATE OR REPLACE RULE empty_rule AS ON INSERT TO base_table DO INSTEAD NOTHING;
This rule won't be executed at all, but thanks to it the PostgreSQL won't automatically create the rule, so there won't be any unexpected evaluation of the nextval function.
Another option is not to use a nextval function in the rules, but to generate the id explicitly (using the nextval function), and use only the 'currval' function in the rules. I.e. the rules may be defined like this
CREATE RULE even_rule AS ON INSERT TO base_table
WHERE (currval('base_sequence') % 2 = 0) DO INSTEAD INSERT INTO
even_table (id, value) VALUES (currval('base_sequence'), NEW.value);
CREATE RULE odd_rule AS ON INSERT TO base_table
WHERE (currval('base_sequence') % 2 = 1) DO INSTEAD INSERT INTO
odd_table (id, value) VALUES (currval('base_sequence'), NEW.value);
and when inserting the rows, perform the following steps
SELECT nextval('base_sequence');
INSERT INTO base_table (value) VALUES ('prvni radek');
Sure, you may not use the sequences in rules at all
CREATE RULE even_rule AS ON INSERT TO base_table WHERE (NEW.id % 2 = 0) DO INSTEAD INSERT INTO even_table (id, value) VALUES (NEW.id, NEW.value); CREATE RULE odd_rule AS ON INSERT TO base_table WHERE (NEW.id % 2 = 1) DO INSTEAD INSERT INTO odd_table (id, value) VALUES (NEW.id, NEW.value);
but this demands a little bit complicated logic when inserting the data, as you have to read the value from the sequence into a variable (e.g. into Java or PHP). In PHP it might be realized like this (for sake of simplicity I omit the error handling etc.):
<?php
// read the next id into a variable
$res = pg_query("SELECT nextval('base_sequence') AS id");
$row = pg_fetch_assoc($res);
// use the value in an insert
pg_query("INSERT INTO base_table (id, value) " .
"VALUES (" . $row['id'] ."'prvni radek')");
?>
I have to remark that it's not possible to insert the rows using this command
INSERT INTO base_table (id, value)
VALUES (nextval('base_sequence'), 'nova hodnota');
as the the "nextval" call won't be evaluated at the beginnning, but it will be passed into the rules - so the problem won't be fixed, it will actually make the problem worse (as it will be evaluated for each NEW.id occurence separately).



