Calculating quantiles and trimmed aggregates

If you've ever needed to compute median or other quantiles, you probably already know there are no built-in aggregates for that in PostgreSQL. There are several solutions available, with various level of flexibility and performance - let me name at least Joe Conway's solution based on PL/R and depesz's PL/pgSQL solution. I personally wasn't happy with any of those but as I'm really lazy, I haven't written anything on my own ... until last week. The extension (already available on pgxn) is based on a disgusting trick (more on that later), but seems to work fine, especially when it comes to performance.

When I wrote that, I realized I could use the same basic idea to implement trimmed averages - not sure if that's the right term in english, but it means something like "remove 3% of the lowest and 2% of the highest values and compute AVG/VARIANCE of the remaining data." Basically it's a neat way to get rid of the outliers. This is available on pgxn too.

So lets see how that works, what's the performance, etc.

Aggregate functions for DISTINCT estimation

I've been working on implementing cross-column stats into PostgreSQL for some time (see the psql-hackers thread and wiki), and it seems that one of the necessary steps should be improving estimates of number of distinct values (thread, wiki). The first really interesting finding is that to get a reliable estimate of distinct values, you really need to read most of the table (generally all of it). The second interesting thing is that there is a lot of very interesting algorithms to continuously update the  estimate in a very limited space. The experimental implementation of the algorithms is very promising, but how to demonstrate it? Using aggregate functions!

1