Why working with Oracle is not boring
We all know than once a project passes the initial hectic phase when the design is tuned (and before it reaches the the phase "we have to deliver it to the customer and it's not ready"), the development is a rather boring experience. Occasionally you'll learn something new, you'll solve some unexpected problem, but in general it's quite monotonous work. Some vendors obviously are aware of this "issue" and obviously decided to take heed to make the development much more fun, as their representatives obviously think that bore is a killer disease.
I'm regularly working with Oracle for a few years, and although I'm a big fan of PostgreSQL, I consider Oracle is a great database and I'm really missing some of it's features in PostgreSQL (and vice-versa in case of some other features). I'm not an Oracle guru - I'm mostly a developer (actually I do have an OCP for PL/SQL) so most of the strictly DBA-related stuff just passes by me.
But from time to time Oracle surprises me by a petty unconsidered issues, resulting into complete confusion of the developer (me, for example) and wastage of time. A perfect example is information provided in case of some deadlocks.
Deadlock info
If a deadlock occurs in the database (which usually means a mistake in the design of the application), Oracle recognizes this (finds a cycle in the graph) and throws the ORA-00060 exception and kills one of the processes (identified as a victim). Information about the deadlock, useful especially to identify the causes and fixing them in the application, are written into a log.
In addition to the problematic SQL queries the log contains so-called "deadlock graph" and a list of SQL queries for each of the sessions. An example of a deadlock graph (referenced in the following text) is:
Deadlock graph:
-----Blocker(s)---- -----Waiter(s)-----
Resource Name session holds waits session holds waits
TM-000037e6-00000000 139 SX 124 SX SSX
TX-0005000a-0001f3aa 124 X 139 S
session 139: DID 0001-0020-00001046 session 124: DID 0001-0022...
session 124: DID 0001-0022-00001DDD session 139: DID 0001-0020...
Rows waited on:
Session 139: obj - rowid = 00007A1D - AAAHodAAAAAAAAAAAA
(dictionary objn - 31261, file - 0, block - 0, slot - 0)
Session 124: obj - rowid = 000037DC - AAADfcAALAAAQxGAAA
(dictionary objn - 14300, file - 11, block - 68678, slot - 0)
I. e. it contains a list of sessions involved in the deadlock (holding a lock on a resource and another session is waiting for it, forming a cycle), information about types of the locks the sessions are holding (blockes) or waiting for (waiters). The smallest number of sessions to cause a deadlock is obviously two. The remaining part is the "rows waited on" - a list of IDs identifying the objects (tables, indexes, etc.) the sessions are waiting for. And here lies the rub.
A general process to identify the deadlock causes according to ORA-00060 is this:
- identify the SQL queries executed in session involved in the deadlock
- identify the objects the sessions attempted to lock (using IDs of objects in the "rows waited on" section and the all_objects dictionary)
Unfortunately it's not so simple, and in my case this lead to absolutely confusing conclusions about the causes of the deadlock.
An example
Consider two tables - A and B, connected with a single foreigh key (table B references a primary key in the table A and this key is not indexed). Moreover there are several bitmap indexes on the table B. So we have roughly this schema:
CREATE TABLE A (
id INT PRIMARY KEY,
...
);
CREATE TABLE B (
id INT PRIMARY KEY,
a_id INT REFERENCES A(id),
col_a NUMBER(1),
...
);
CREATE BITMAP INDEX b_col_a_bit_idx ON B(col_a);
And there are two sessions running on this schema, executing these two SQL queries
Session 1: ---------- UPDATE A SET id = :id, first_name = :fname, ... Session 2: ---------- INSERT INTO B (ID, A_ID, COL_A, ...) VALUES (...)
which leads to a dealock - the deadlock info is listed above. The next step is identifying the objects the sessions are waiting on - in the "rows waited on" are two IDs (31261 a 14300), and a simple query into the all_objects catalogue shows that 31261 is actually the table B, while 14300 is a LOB segments (SYS_LOB000....) related to a completely different table (you have to traverse several foreign keys to reach it). WTF?
Locking the table B makes sense - there is an unindexed foreign key into table B, and the primary key of table B (referenced by the foreign key) is updated. It's a silly application mistake, but it explains the lock on table B.
But why does is lock (a LOB in) a completely unrelated table? Imagine this structure:
A --- [M:N] --- X --- [M:N] --- Y --- [M:N] --- B
i.e. there are 5 more tables between tables A and B (X, Y and three tables representing the M:N connections), and the locked BLOB column belongs to the middle table. How could an UPDATE A or INSERT INTO B lead to a lock on this column? There are no triggers on any of the tables, all the foreign keys (with the already mentioned exception) are indexed, etc. How is this possible?
Let's skip over the days we've tried to resolve this strange issue - the reason is quite simple ...
The truth is that a TM lock (in this case TM-000037e6-00000000) is not a row-level lock, thus the information listed in "rows waited on" does not make sense. More precisely the object ID is not related to the deadlock - the real object ID is listed in the first part of the lock ID - the value "37e6" corresponds to ID 14310 which is (naturally) one of the bitmap indexes on table B.
The mystery is solved, dear Watson ...
Conclusion
I understand that for certain types of locks (those not locking individual rows) the information about locked object is provided in a different way - that's not a problem. But I absolutely don't understand why Oracle provides a completely confusing object ID in the "rows waited on" and does not replace it with a correct ID (it has it in the lock ID) od does not replace it with a "no row" (which is already used in case of some deadlocks).
But let's look at the bright side - at least it gave me an opportunity to investigate the internals of Oracle after a long time. And it definitely was not boring ...




