Aggregate functions for DISTINCT estimation
I've been working on implementing cross-column stats into PostgreSQL for some time (see the psql-hackers thread and wiki), and it seems that one of the necessary steps should be improving estimates of number of distinct values (thread, wiki). The first really interesting finding is that to get a reliable estimate of distinct values, you really need to read most of the table (generally all of it). The second interesting thing is that there is a lot of very interesting algorithms to continuously update the estimate in a very limited space. The experimental implementation of the algorithms is very promising, but how to demonstrate it? Using aggregate functions!




