Collecting query histogram
If you're using PostgreSQL and need to track performance anomalies, you probably know the option log_min_duration and maybe even auto_explain contrib module, that allow you to react to queries that are slower than the defined limit. That's a great thing, but there's a catch - how to set the threshold, especially when there are queries with very different query times?
If you set it too low, you'll get huge log files with a lot of queries. If you set it too high, you'll probably miss changes in case of the very show queries. What if you set the limit to 200 ms and the queries that used to run at 50 ms suddenly slow down to 150 ms? The performance effectively dropped to 30% yet nothing got logged ...
There's no magical way to set the threshold, at least I'm not aware of it, but it's possible to see a complex view at all the queries - it's called histogram. I wrote a small extension query_histogram (available at pgxn), that will allow you to collect histogram of queries and access it through SQL.
The installation is thoroughly described in README, so just very briefly - if you use 9.1 you may use extensions. Add the "query_histogram" to shared preload libraries, and then
$ make install $ psql dbname -c "CREATE EXTENSION query_histogram"
which makes the histogram ready to use in dynamic mode - that means you can change the parameters (number and width of bins) on the fly
SET query_histogram.bin_count = 100; SET query_histogram.bin_width = 50;
And you can see the current data on the fly (see README for more details)
SELECT * FROM query_histogram;
and you may even reset the histogram
SELECT * FROM query_histogram_reset();
and so on. The current version of the module works with 9.1 only, I do expect to release a 9.0-compatible version next week.
Overhead of the histogram
An important question is of course how this extension influences the performance - I've tried a read-only pgbench with various histogram settings (dynamic vs. static, various levels of sampling) and compared it to plain PostgreSQL and pg_stat_statements overhead (which does something similar and is part of the official distribution).
The whole database easily fits into the memory (so I/O is not the limiting factor) and the test was performed on 4-core system (Core i5-2500) with various number of clients. Obviously the histogram performs quite well, no matter what is the configuration - the overhead is always 3-5% and much lower than with pg_stat_statements (where the overhead is about 8%).
But these are somehow extreme results as the database is not limited by I/O so the impact of locking (of structures in the shared memory) is much more visible. In a regular operation the I/O plays much more important role and the performance impact will be much less visible.
At the same time it's true that when running on more CPUs the locking may be actually even more significant, but I can't test that as I don't have access to such machine. When designing the histogram I've tried to allow a configuration minimizing the locking - static histogram with low sampling rate.
Future
The histogram is global, i.e. per cluster. My plan is to allow "per database" histogram, although only for selected databases.
Is this extension interesting for you? If yes, please test it and let me know about any comments, bugs or suggested changes etc.






We've often used dtrace to get live histograms of query activity, but it will be nice to have a way to get this type of information on the Linux boxes we manage. We'll have to take a look at this; thanks for releasing it.