SSD benchmark results - TPC-H

If you compare the SSD with a traditional drive in a TPC-H benchmark, you'll notice that the performance increases even less than in case of the read-write pgbench workload. For example when comparing m-time (i.e. time spent evaluating queries with identical query plans, more detailed definition is available in the article about TPC-H benchmark with a traditional drive), SSD with an XFS file system performs like this

XFS performance (m-time) with a SSD drive

while the traditional drive performs like this

XFS performance (m-time) with a traditional drive

i.e. the SSD drive is about 1.5x faster  than the traditional drive (which is significantly worse than with the read-only/read-write pgbench, where the SSD was about 25x/13x faster). But even for a SSD drive it's true that the best performance is achieved for large database blocks and the size of file system blocks does not matter.

Benchmark results / HDD + TPC-H

Last week I've posted a bief analysis of the read-only and read-write OLTP parts of the benchmark, now it's time to look into the part dealing with DWH/DSS workload, based originally on the TPC-H benchmark. In brief, the average results for various block sizes are these

average m-score for all file systems

A more detailed description and definition of m-score will be presented later, for now it should suffice that higher values mean better performance. It's clear that larger blocks (file system and database) yield better performance when querying the data.

But does this hold for other tasks common in DWH applications, i.e. loading the data, creating indexes, foreign keys, collecting stats etc. And how do the various file systems perform?

DSS (TPC-H-like) benchmark with PostgreSQL

When benchmarking PostgreSQL database, pgbench is probably the first choice. But the default pgbench transactions are rather OLTP-like (it's a TCP-B-like stress test) so it's not very usable when you need to test a DSS/DWH-like workload. I needed to do that recently, and I wasn't very happy with the available tools. Luckily, the TPC Council provides DBGEN, a tool that makes it quite easy to generate data and scripts - and it's not very difficult to make that work with PostgreSQL. So let's see how to make it work  ..

1