Fuzzy.cz http://www.fuzzy.cz/en/ ... because the world is not crisp ... en Wed, 04 Jan 2012 00:06:36 +0100 http://www.rssboard.org/rss-specification tv@fuzzy.cz Fulltext with dictionaries in shared memory http://www.fuzzy.cz/en/articles/fulltext-with-dictionaries-in-shared-memory/ <p> If you're using the <a href="http://www.postgresql.org/docs/9.0/static/textsearch.html">fulltext built-in to PostgreSQL</a> and if you can't use solution based on <a href="http://www.postgresql.org/docs/9.0/static/textsearch-dictionaries.html#TEXTSEARCH-SNOWBALL-DICTIONARY">snowball</a> due to the nature of the language (as it works great for english, but there's not anything similar for czech with reasonable accuracy and probably never will be), you're somehow forced to use ispell based dictionaries. In that case you've probably noticed two annoying features.</p> <p> For each connection (backend), the dictionaries are loaded into private memory, i.e. each connection spends a lot of CPU parsing the dictionary files on the first query, and it needs a lot of memory to store the same information. If the parsed dictionary needs 25MB (and that's not an exception) and if you do have 20 concurrent connections using the fulltext, you've suddenly lost 500 MB of RAM. That may seem like a negligible amount of RAM nowadays, but there are environments where this is significant (e.g. VPS or some cloud instances).</p> <p> There are workarounds - e.g. using a connection pool with already initialized connections (you'll skip the initialization time but you're wasting memory) or keeping small number of persistent connections just for fulltext queries (but that's inconvenient to work with). There are probably other solutions but none of them is perfect ...</p> <p> Recent issues with the fulltext forced me to write an&nbsp;<a href="http://github.com/tvondra/shared_ispell">extension</a> that allows storing the dictionaries in the shared memory. Even this solution is not perfect (more on that later), but it's definitely a step in the right direction. So how does it work and what it does?</p> Wed, 04 Jan 2012 00:06:36 +0100 be7604c9a3ecf632bbe65e0e92888255 Connection limits - a proof of concept http://www.fuzzy.cz/en/articles/connection-limits-a-proof-of-concept/ <p> From time to time someone on the mailing list <a href="http://archives.postgresql.org/pgsql-general/2011-11/msg01099.php">asks</a> if there's a way to limit number of connections for a given IP address, database or a user. No, currently there's nothing like that built into the core,. although it'd be pretty useful feature, especially on clusters shared by multiple clients (applications, customers, ...). Sometimes such quotas can be reasonably implemented using a connection pooler like <a href="http://pgfoundry.org/projects/pgbouncer/">pgbouncer</a> or things like <a href="http://www.netfilter.org/projects/patch-o-matic/pom-external.html#pom-external-connlimit">netfilter connlimit</a>, but both solutions have disadvantages.</p> <p> But hey! We do have a little thingies called "extensions", "hooks" and "shared libraries." Last week I hacked up a simple "proof of concept" extension that uses "client auth hook" and allows you to limit number of connections by various criteria. It's not perfect (more on that later), but it works quite fine. It's available at <a href="https://github.com/tvondra/connection_limits">github</a> and once I fix those issues I'm aware of I'll publish it on <a href="http://www.pgxn.org/">pgxn</a>.</p> <p> PS: Thanks to Magnus for pointing out the existing "client auth hook" might be useful to implement this, and TL for pointing out how unbelievably ugly kluge the original version of the extension was.</p> Wed, 07 Dec 2011 16:34:55 +0100 51b2aacffbac56ded0177909c6b3c6ef Calculating quantiles and trimmed aggregates http://www.fuzzy.cz/en/articles/calculating-quantiles-and-trimmed-aggregates/ <p> If you've ever needed to compute median or other quantiles, you probably already know there are no built-in aggregates for that in PostgreSQL. There are several solutions available, with various level of flexibility and performance - let me name at least <a href="http://www.joeconway.com/presentations/oscon-pres-2003-1.pdf">Joe Conway's solution based on PL/R</a> and <a href="http://www.depesz.com/index.php/2009/07/13/calculating-median/">depesz's PL/pgSQL</a> solution. I personally wasn't happy with any of those but as I'm really lazy, I haven't written anything on my own ... until last week. The extension (already <a href="http://pgxn.org/dist/quantile/1.0.0/">available on pgxn</a>) is based on a disgusting trick (more on that later), but seems to work fine, especially when it comes to performance.</p> <p> When I wrote that, I realized I could use the same basic idea to implement trimmed averages - not sure if that's the right term in english, but it means something like "remove 3% of the lowest and 2% of the highest values and compute AVG/VARIANCE of the remaining data." Basically it's a neat way to get rid of the outliers. This is <a href="http://pgxn.org/dist/trimmed_aggregates/1.0.0/">available on pgxn</a> too.</p> <p> So lets see how that works, what's the performance, etc.</p> Mon, 07 Nov 2011 22:01:06 +0100 a6de89d61e6694fff07faaa0201d5524 Collecting query histogram http://www.fuzzy.cz/en/articles/collecting-query-histogram/ <p> If you're using PostgreSQL and need to track performance anomalies, you probably know the option <a href="http://www.postgresql.org/docs/9.1/static/runtime-config-logging.html#GUC-LOG-MIN-DURATION-STATEMENT">log_min_duration</a> and maybe even <a href="http://www.postgresql.org/docs/9.1/static/auto-explain.html">auto_explain</a> contrib module, that allow you to react to queries that are slower than the defined limit. That's a great thing, but there's a catch - how to set the threshold, especially when there are queries with very different query times?</p> <p> If you set it too low, you'll get huge log files with a lot of queries. If you set it too high, you'll probably miss changes in case of the very show queries. What if you set the limit to 200 ms and the queries that used to run at 50 ms suddenly slow down to 150 ms? The performance effectively dropped to 30% yet nothing got logged ...</p> Fri, 04 Nov 2011 03:53:55 +0100 a98089a34fa1284bb1d4edb07a0ebf50 SSD benchmark results - TPC-H http://www.fuzzy.cz/en/articles/ssd-benchmark-results-tpc-h/ <p> If you <a href="http://www.fuzzy.cz/bench/compare-tpch.php?type[]=btrfs-datacow-barrier:2&amp;type[]=btrfs-datacow-barrier:4&amp;type[]=btrfs-datacow-nobarrier:2&amp;type[]=btrfs-datacow-nobarrier:4&amp;type[]=btrfs-nodatacow-barrier:2&amp;type[]=btrfs-nodatacow-barrier:4&amp;type[]=btrfs-nodatacow-nobarrier:2&amp;type[]=btrfs-nodatacow-nobarrier:4&amp;type[]=ext2:2&amp;type[]=ext2:4&amp;type[]=ext3-journal-barrier:2&amp;type[]=ext3-journal-barrier:4&amp;type[]=ext3-journal-nobarrier:2&amp;type[]=ext3-journal-nobarrier:4&amp;type[]=ext3-ordered-barrier:2&amp;type[]=ext3-ordered-barrier:4&amp;type[]=ext3-ordered-nobarrier:2&amp;type[]=ext3-ordered-nobarrier:4&amp;type[]=ext3-writeback-barrier:2&amp;type[]=ext3-writeback-barrier:4&amp;type[]=ext3-writeback-nobarrier:2&amp;type[]=ext3-writeback-nobarrier:4&amp;type[]=ext4-journal-barrier:2&amp;type[]=ext4-journal-barrier:4&amp;type[]=ext4-journal-nobarrier:2&amp;type[]=ext4-journal-nobarrier:4&amp;type[]=ext4-ordered-barrier:2&amp;type[]=ext4-ordered-barrier:4&amp;type[]=ext4-ordered-nobarrier:2&amp;type[]=ext4-ordered-nobarrier:4&amp;type[]=ext4-writeback-barrier:2&amp;type[]=ext4-writeback-barrier:4&amp;type[]=ext4-writeback-nobarrier:2&amp;type[]=ext4-writeback-nobarrier:4&amp;type[]=jfs:2&amp;type[]=jfs:4&amp;type[]=nilfs2-barrier:2&amp;type[]=nilfs2-barrier:4&amp;type[]=nilfs2-nobarrier:2&amp;type[]=nilfs2-nobarrier:4&amp;type[]=reiserfs-flush:2&amp;type[]=reiserfs-flush:4&amp;type[]=reiserfs-none:2&amp;type[]=reiserfs-none:4&amp;type[]=xfs-barrier:2&amp;type[]=xfs-barrier:4&amp;type[]=xfs-nobarrier:2&amp;type[]=xfs-nobarrier:4&amp;key=tpch-m-time">compare</a> the SSD with a traditional drive in a TPC-H benchmark, you'll notice that the performance increases even less than in case of the <a src="http://www.fuzzy.cz/en/articles/benchmark-results-with-ssd-read-only-pgbench/">read-write pgbench workload</a>. For example when comparing m-time (i.e. time spent evaluating queries with identical query plans, more detailed definition is available in the article about <a src="http://www.fuzzy.cz/en/articles/benchmark-results-hdd-tpch/">TPC-H benchmark with a traditional drive</a>), SSD with an XFS file system performs like this</p> <p> <img alt="XFS performance (m-time) with a SSD drive" src="http://www.fuzzy.cz/image/ssd-2-xfs-nobarrier-tpch-m-time.png" style="width: 590px; height: 275px;" title="XFS performance (m-time) with a SSD drive" /></p> <p> while the traditional drive performs like this</p> <p> <img alt="XFS performance (m-time) with a traditional drive" src="http://www.fuzzy.cz/image/hdd-2-xfs-nobarrier-tpch-m-time.png" style="width: 590px; height: 275px;" title="XFS performance (m-time) with a traditional drive" /></p> <p> i.e. the SSD drive is about 1.5x faster&nbsp; than the traditional drive (which is significantly worse than with the read-only/read-write pgbench, where the SSD was about 25x/13x faster). But even for a SSD drive it's true that the best performance is achieved for large database blocks and the size of file system blocks does not matter.</p> Wed, 28 Sep 2011 23:00:50 +0200 404f906efe45a9e97e9b1243fe122a3c SSD benchmark results SSD - read-write pgbench http://www.fuzzy.cz/en/articles/ssd-benchmark-results-read-write-pgbench/ <p> So let's see the next part of SSD results - read-write pgbench. Just like in case of a read-only benchmark, the performance of all file systems is almost exactly the same (with the exception of nilfs2, as mentioned later). The average performance looks like this</p> <p> <img alt="" src="http://www.fuzzy.cz/image/ssd-average-tps-rw.png" style="width: 590px; height: 275px;" /></p> <p> BTW I forgot to mention one important thing in the previous post - if you're interested in the data collected during the benchmark, I'm ready to provide them. There's one slight inconvenience, though - the HDD results occupy 3.4GB (1GB gzipped), SSD results occupy 38GB (10GB compressed). That's too much to place it to this blog. But if you're going to pgconf.eu in Amsterdam, just ask me ...</p> Wed, 21 Sep 2011 12:43:23 +0200 5705a1da4e4d8c516e13aec30eb38027 Benchmark results with SSD - read-only pgbench http://www.fuzzy.cz/en/articles/benchmark-results-with-ssd-read-only-pgbench/ <p> In the last few posts I've discussed benchmark results with a traditional SATA drive, now it's time to discuss results of the same tests with a SSD drive (Intel 320). This post is about results of the read-only pgbench test. As expected, the SSD drive performs better than a traditional drive in all three tests, but it's interesting to see how the performance boost varies for various tests and how perfectly are eliminated differences between the file systems.</p> <p> The average results for all file systems looks like this</p> <p> <img alt="average read-only performance with SSD" src="http://www.fuzzy.cz/image/ssd-average-tps-ro.png" title="average read-only performance with SSD" /></p> <p> The SSD results are available <a href="http://www.fuzzy.cz/bench/">here</a>, comparison of read-only results is <a href="http://www.fuzzy.cz/bench/compare-pgbench.php?type[]=btrfs-datacow-barrier:3&amp;type[]=btrfs-datacow-nobarrier:3&amp;type[]=btrfs-nodatacow-barrier:3&amp;type[]=btrfs-nodatacow-nobarrier:3&amp;type[]=ext2:3&amp;type[]=ext3-journal-barrier:3&amp;type[]=ext3-journal-nobarrier:3&amp;type[]=ext3-ordered-barrier:3&amp;type[]=ext3-ordered-nobarrier:3&amp;type[]=ext3-writeback-barrier:3&amp;type[]=ext3-writeback-nobarrier:3&amp;type[]=ext4-journal-barrier:3&amp;type[]=ext4-journal-nobarrier:3&amp;type[]=ext4-ordered-barrier:3&amp;type[]=ext4-ordered-nobarrier:3&amp;type[]=ext4-writeback-barrier:3&amp;type[]=ext4-writeback-nobarrier:3&amp;type[]=jfs:3&amp;type[]=nilfs2-barrier:3&amp;type[]=nilfs2-nobarrier:3&amp;type[]=reiserfs-flush:3&amp;type[]=reiserfs-none:3&amp;type[]=xfs-barrier:3&amp;type[]=xfs-nobarrier:3&amp;key=tps-ro">here</a>.</p> Sun, 18 Sep 2011 23:35:32 +0200 d604daa6c5cb2adfe3f965bb215046df Benchmark results / HDD + TPC-H http://www.fuzzy.cz/en/articles/benchmark-results-hdd-tpch/ <p> Last week I've posted a bief analysis of the <a src="http://www.fuzzy.cz/en/articles/benchmark-results-hdd-read-only-pgbench/">read-only</a> and <a src="http://www.fuzzy.cz/en/articles/benchmark-results-hdd-read-write-pgbench/">read-write</a> OLTP parts of the benchmark, now it's time to look into the part dealing with DWH/DSS workload, based originally on the <a src="http://www.fuzzy.cz/en/articles/dss-tpc-h-benchmark-with-postgresql/">TPC-H benchmark</a>. In brief, the average results for various block sizes are these</p> <p> <img alt="average m-score for all file systems" src="http://www.fuzzy.cz/image/average-tpch-m-score.png" style="width: 590px; height: 275px;" title="average m-score for all file systems" /></p> <p> A more detailed description and definition of m-score will be presented later, for now it should suffice that higher values mean better performance. It's clear that larger blocks (file system and database) yield better performance when querying the data.</p> <p> But does this hold for other tasks common in DWH applications, i.e. loading the data, creating indexes, foreign keys, collecting stats etc. And how do the various file systems perform?</p> Tue, 13 Sep 2011 19:47:32 +0200 63bbdc39f2fc0c26df55467d9be7745c Benchmark results / HDD + read-write pgbench http://www.fuzzy.cz/en/articles/benchmark-results-hdd-read-write-pgbench/ <p> I've <a src="http://www.fuzzy.cz/en/articles/benchmark-results-hdd-read-write-pgbench/">already</a> briefly analyzed results of the read-only pgbench runs, now it's time to discuss the read-write workload. The average tps for all file systems with write barriers enabled looks like this</p> <p> <img alt="average tps with read-write workload and write barriers enabled" src="http://www.fuzzy.cz/image/average-tps-rw-no-xfs-barrier.png" style="width: 590px; height: 275px;" title="average tps with read-write workload and write barriers enabled" /></p> <p> That clearly shows that, just like in case of a read-only workload, better performance is achieved with smaller database blocks. The difference between 1kB and 32kB blocks is about 30%, depending on the file system block size.</p> Fri, 09 Sep 2011 08:02:54 +0200 43e3a622a9713850d50fee5585689df4 Benchmark results / HDD + read-only pgbench http://www.fuzzy.cz/en/articles/benchmark-results-hdd-read-only-pgbench/ <p> In the <a href="http://www.fuzzy.cz/en/articles/results-of-the-block-size-benchmark-for-hdd-introduction/">previous post</a>, I've briefly described <a href="http://www.fuzzy.cz/bench/">benchmark results</a>, especially how to interpret the images. Now it's time to analyze the first part of the results, namely the read-only portion of the OLTP workload (i.e. the output of the read-only pgbench runs).</p> <p> TPS over all tested file systems (detailed comparison is available <a href="http://www.fuzzy.cz/bench/compare-pgbench.php?type[]=btrfs-datacow-barrier%3A1&amp;type[]=btrfs-datacow-nobarrier%3A1&amp;type[]=btrfs-nodatacow-barrier%3A1&amp;type[]=btrfs-nodatacow-nobarrier%3A1&amp;type[]=ext2%3A1&amp;type[]=ext3-journal-barrier%3A1&amp;type[]=ext3-journal-nobarrier%3A1&amp;type[]=ext3-ordered-barrier%3A1&amp;type[]=ext3-ordered-nobarrier%3A1&amp;type[]=ext3-writeback-barrier%3A1&amp;type[]=ext3-writeback-nobarrier%3A1&amp;type[]=ext4-journal-barrier%3A1&amp;type[]=ext4-journal-nobarrier%3A1&amp;type[]=ext4-ordered-barrier%3A1&amp;type[]=ext4-ordered-nobarrier%3A1&amp;type[]=ext4-writeback-barrier%3A1&amp;type[]=ext4-writeback-nobarrier%3A1&amp;type[]=jfs%3A1&amp;type[]=nilfs2-barrier%3A1&amp;type[]=nilfs2-nobarrier%3A1&amp;type[]=reiserfs-flush%3A1&amp;type[]=reiserfs-none%3A1&amp;type[]=xfs-barrier%3A1&amp;type[]=xfs-nobarrier%3A1">here</a>), with XFS excluded (it's the only system with 512B blocks and makes the image a bit less readable) looks like this</p> <p> <img alt="average read-only tps for all the filesystems" src="http://www.fuzzy.cz/image/average-tps-ro-no-xfs.png" style="width: 590px; height: 275px;" title="average read-only tps for all the filesystems" /></p> <p> The image clearly shows that the best performance is achieved for small database blocks and large file system blocks. The dependency on file system block size is not equally strong for all the file systems - for some file systems it's quite strong, for other file systems there's almost no dependecy.</p> Wed, 07 Sep 2011 11:40:10 +0200 a73fdb853d2c2978e465ee3ba1b24c03