Benchmark of ways to store files in a PostgreSQL

Some time ago I've presented an article about various ways to store file in a PostgreSQL database, and at the end of the article I've promised a benchmark comparing the ways described. It took me some time, but the benchmark is here ...

The benchmark is quite simple - a certain amount of files (with size randomly chosen from a specified interval) is generated, and then the actual benchmark is executed. All the files are read into a database, they are read from the database and finally deleted from the DB. During all these three steps, the time consumed is measured and an average speed is computed (in kB/s).

Three ways of storing files in a PostgreSQL database were tested:

  • plain BYTEA column - the whole file is stored in a plain BYTEA column
  • striped BYTEA - the file is split into segments of a given size, and these segments are stored in a separate table
  • LOB - the file is stored similarly to striped BYTEA, but this is supported by the PostgreSQL directly and thus provides support for streaming etc.

Regarding the number of files / size of files, the following four combinations were tested (chosen so that it takes about 500MB of data):

name number of files min. size max. size total size
tiny files 10000 10 kB 100 kB 537 MB
small files 1600 128 kB 512 kB 500 MB
medium files 700 512 kB 1 MB 525 MB
large files 70 5 MB 10 MB 525 MB

Two of the ways described allow to chose size of a segment used to store or read the data, namely:

  • storing using LOB - when reading the LOB, it's possible to read the stream using buffer of various sizes
  • LOB simulation (splitting the file into bytea segments) - the files may be split into segmments of arbitrary sizes (subsequently used when reading the file)

With respect to this, the following variants were tested:

  • bytea - single option (there are no segments)
  • LOB - reading using buffers with sizes 512B, 1kB, 2kB, 4kB, 8kB, 16kB, 32kB, 64kB, 128kB and 256kB
  • LOB simulation - storing the files by 512B, 1kB, 2kB, 4kB, 8kB, 16kB, 32kB, 65kB, 128kB and 256kB segments

Several important notices:

Main part of the benchmark is written in PHP, as most of my small web projects is written using this language, so a PHP benchmark makes a good sense to me. Most of the PHP settings is quite standard, with the exception that the memory limit was increased to 128 MB because of bytea columns.

The PostgreSQL 8.3.6 with a more or less standard settings was used, i.e. basic settings to reflect amount of memory (effective_cache, buffers, etc.) were changed. No special tuning for this benchmark was performed. Autovacuum demon was enabled, but a little bit more aggressive settings were used  (lower delay, higher cost limit). If interested you may download the config.

The results presented here were measured on my home PC - AMD 64 X2 4400+, 2GB of RAM, common SATA drive, Gentoo Linux (compiled for 32bit). On a powerful server with a more performant HW (SCSI or SAS drives, better CPUs, etc.) you'd probably measure higher values, but the relation between the solutions will most probably remain the same.

The files were generated into a directory stored in a memory using a ramfs filesystem, so that the reading of the files does not interfere with the benchmark. The contents of the files is randomly generated using /dev/urandom device.

For all benchmarks of the given type (file size) the same fileset was used, but the database was always dropped and recreated before each benchmark, and after loading the files into the database the ANALYZE command was executed manually.

All charts listed in the following sections represent number of seconds to perform a particular operation (writing the files into a DB, reading them and then deleting them).

Tiny files

As illustrated by the figure below, when writing very small files into a DB, the best performance may be achieved by LOB, followed by a plain BYTEA column. Striped BYTEA is the least performant, but for larger segment sizes (generally above 4kB) is equal to the plain BYTEA column (above 8kB there really is not difference), which follows on small file size (100kB at most and thus small number of segments.

When reading the data, the LOB solution is surprisingly slow (regardless of buffer size), while the solutions using plain BYTEA and striped BYTEA are quite well-balanced (again - the larger the segment is, the better the striped BYTEA performance is and from about 4kB there is no difference.

When deleting files from a database there are no serious performance difference - the BYTEA is the slowest one, but LOB is faster for just about 10% and striped BYTEA oscillates around them.

Small files

When writing small files (cca from 128kB to 512kB) the DB, the LOB is unambiguously fastest, followed by a plain BYTEA column. But the striped BYTEA are not slower - from about 4kB segments the solution is comparable to plain BYTEA.

When reading files from a database the solutions are quite balanced - with the exception of very small BYTEA segments (512B and 1KB) - the striped BYTEA with segments from 4kB to 64kB appear to be the best performing, followed by a plain BYTEA column. The LOB solution quite notably oscillates.

Deleting files from a database does not bring any serious performance differences between the solutions. The BYTEA column is the fastest again, LOB is for about 20% slower, and the striped BYTEA is the slowest - sometimes it's faster, sometimes slower (regardless of the segment size).

Medium files

Even in case of writing small files (cca 512kB and 1MB) the situation is not very difficult from the previous two sections - the LOB is again the fastest, followed by a plain BYTEA column. Since about 8kB segment size, the striped BYTEA solution is equal to the plain BYTEA.

When reading from a database, the performance is almost equal for all the solutions, with an exception of very small segment sizes (512B and 1kB). In general the striped BYTEA with segments between 4kB and 64kB is the best, followed by a plain BYTEA column, and a LOB is the slowest (and does not oscillate this time).

When deleting the data from a database, the BYTEA column is the fastest, LOB is for about 20% slower, and the striped BYTEA oscillates around these solutions - sometimes it is faster, sometimes it's slower.

Large files

Even in case of writing large files (cca from 5MB to 10MB) the situation remains the same -  the LOB is the fastest one, followed by a BYTEA columns, and since about 8kB segment size the striped BYTEA solution is almost equal to plain BYTEA columns.

Reading the files from a database remains the same too - with the exception of very small segments (512B and 1kB) there is no difference between striped and plain BYTEA columns. In general the striped BYTEA with segments between 4kB and 64kB are the best, followed by a plain BYTEA column and LOBs.

No change here - when deleting the large files from a database, the BYTEA column is the fastest one, LOB is for about 10% slower, and striped BYTEA oscillates around the previous two solutions - sometimes it's faster, sometimes it's slower.

Shrnutí

From the previous sections it follows that:

  • deleting is almost equal in all cases - none of the sollutions is exceeding the other ones. If you need fast delete, you may use "lazy delete" - instead of deleting the row, mark it as deleted and delete it later (e.g. at night using a cron, etc.)
  • regarding the writing, LOB is undoubtedly the fastest solution, followed by a plain BYTEA columns (slower for about 50% up to 100%). For segments larger than 4kB, the performance of striped BYTEA is equal to plain BYTEA columns.
  • when reading the data, the striped BYTEA solution (between 8kB and 32kB) is the best one, followed by a plain BYTEA column, and finally by LOB.

The conclusion is quite simple - if you need to store relatively small files (i.e. such that do not exceed the PHP memory limit when escaping), it's probably the best to use a plain BYTEA column.

In case you need to store large files, the decision is not as easy - if you need a really fast insert, it'd recommend using LOB. But if you're considering read speed and keeping the integrity of the DB (LOBs do not allow foreign keys etc.), the striped BYTEA is probably better.

Files for download

  • scripts for the benchmark - I have to warn you in advice that these files are not quite (i.e. "at all") ready to be run outside my PC - there are encoded paths, database credentials and so on. So it's suitable for illustration purposes, but to execute them you'll have to poke at them a little bit (i.e. "a lot").
  • results of the benchmark - Results of the benchmark (logs of the PHP script, and a dstat log).

 

Comments

Nice work

Thank you very much for having performed these benchmarks. It must have been quite a bit of coding and executing to get these stats. Just what I was looking for. The results are as expected.

RE: Nice work

Thanks.

Anyway this benchmark is a bit old (over 2 years), so it used 8.3.6. I wonder what would be the results with a recent version (9.0 or maybe 9.1) ...

New comment

All the comments have to be accepted, so there may be some delay between submitting and accepting (or rejecting) the comment. If you enter the e-mail address, you will be informed about acceptance or rejection.

Subject or body may not contain HTML tags - they will be automatically removed. Paragraphs may be separated using a newline (ENTER).

(optional)