Why to store files in a database

Imagine you are working on a web application that requires storing of files (e.g. a photo gallery). And there comes a question "Should I store the files in a database, or directly on the filesystem?" Maybe you've asked this question on a discussion forum / mailing list and I guess the answer you've received was something like "It's much more efficient to store files outside the database." Well, I'll try to explain why I prefer storing files in a database. This article is based on a discussion I've had with James Lewis, after I posted a question to pg-php mailing list.

are you nuts? Why to store files in a database?

The main problem when storing files outside the database (and reference them by path stored in the database) is that it mixes transactional an non-transactional environments - the database has all the nice features (is transactional) while the filesystem is "fast but stupid" (no transactions, no rollback, no such nice features). According to my experience this brings a lot of trouble if you want to develop a robust an reliable application (who doesn't?):

Storing the files in a filesystem is quite simple - say for example you have a table "Files" with all the metadata of a file (filename, size, timestamps, ...) and a path referencing the location on a filesystem:

CREATE TABLE Files (
    id       SERIAL      PRIMARY KEY,
    filename VARCHAR(64) NOT NULL,
    filesize INT         NOT NULL,
    path     TEXT        NOT NULL,
    created  TIMESTAMP   NOT NULL,
    ... other metadata ...
);

Say you want to delete a file - that means deleting a row from the table and the file from a filesystem. The problems demonstrate themselves when something goes wrong - let's suppose deleting a row from the table works fine, but when deleting the file from the filesystem (using the path) fails for some reason. In case you've commited the DELETE before attempting to delete the file from a filesystem, then there is a "dangling file" (file not referenced from the table, i.e. wasted space).

In case you have not commited the DELETE, you may easily roll back the DELETE and everything is fine, right? Well, no - if you've attempted to delete several files at once and the last delete from the filesystem fails, then rolling back is not an option because there will be missing data (filesystems do not roll back the changes).

Similar "problem scenarios" exist for all the operations involving transactional and non-transactional resources (not just filesystems). It's possible to fix them, but it means you have to either treat both resources as non-transactional (i.e. not to use transactions at all, which I don't like - after all transactions are one of the reasons to use databases) or implement some kind of transactional support (albeit a primitive one) for the non-transactional resources - and that's quite tedious and error prone work (not to mention the effect on efficiency which was one of the reasons to use the non-transactional filesystem).

Maintenance problems

Sure - if you have full control over the application, it may be possible to fix the needed operations using a well prepared API, but let's talk about a very important aspect of application lifecycle - maintenance, and especially about it's very important part - backup.

Backing up a PostgreSQL database is quite simple - when performed properly, you create a snapshot of the database as it existed at the time of backup execution, and it will be 100% consistent (no missing / dangling data, etc.).

But what about non-transactional resources? There is no notion of transaction or snapshot for commonly used filesystem (ok, there are filesystem with snapshot support - "zfs" for example, but it'll take time before they'll be used on inconsiderable number of production systems).

There's nothing like "consistent backup" of a filesystem - if you start a backup and someone deletes a file from the backed-up directory, it may or may not be backed up (and it's hard to predict). If you backup the database first and then the filesystem, someone may delete a file between those two steps, effectively creating an inconsistent backup - the file won't be backed up, but will be referenced in the database backup. If you backup the filesystem first (and then the database), someone may create a file between those steps, effectively creating an inconsistent backup (again, a file will be referenced from database backup, but not contained in the filesystem backup).

And there's no way to fix this by a simple API :-( Fixing this requires a well defined backup procedure and a carefully implemented operations with the files - see the next section.

A messy "outside database" solution

As already mentioned, it's possible to build some extra functionality around the filesystem to provide "fake transactions" (without the possibility to roll back for example), so the application will work quite safely while using non-transactional resources ... but it's a kind of messy.

For example you can store files in a dedicated directory under unique IDs generated by a database sequence, and never delete or modify an already created file. For example you may use the "Files" table described above, but instead of "path" use a unique ID.

CREATE TABLE Files (
    id       SERIAL      PRIMARY KEY,
    filename VARCHAR(64) NOT NULL,
    filesize INT         NOT NULL,
    file_id  SERIAL      NOT NULL,
    created  TIMESTAMP   NOT NULL,
    ... other metadata ...
    CONSTRAINT unique_file_id UNIQUE (file_id)
);

The basic operations will be performed like this:

  • CREATE - when storing a file in a filesystem, you will store it under a new "file_id" value (fetch it from a sequence before or after insert into the "Files" table).
  • UPDATE - if you need to update data of the file, just fetch a new "file_id" and store the new version under the new value (do not overwrite the original file)
  • DELETE - instead of delete just leave the file data there (just delete the rows from "Files" table)

When backing up, you have to backup the database first, then the filesystem (it does not work if the filesystem is backed up first). This works (and I use it on several places), but I think it's quite messy and there are "dangling files" i.e. files not referenced from the database.

The "dangling files" won't cause dangerous inconsistencies (missing data, etc.) but may cause problems with wasted disk space - this may be fixed using a cron job executed periodically (e.g. every night) and cleaning files not referenced from the database. Just be careful not to delete new (not yet commited) files.

A cleaner approach would be to keep a list of deleted files in the database - a separate table of deleted file_id values, and use this when cleaning the dangling files.

But while this messy solution works, it has a serious disadvantage - it's quite difficult to ensure the particular order of backup (database before filesystem), for example when using a shared hosting, maintained by a different company ...

Comments

There are no comments for this article (or are awaiting acceptance).

New comment

All the comments have to be accepted, so there may be some delay between submitting and accepting (or rejecting) the comment. If you enter the e-mail address, you will be informed about acceptance or rejection.

Subject or body may not contain HTML tags - they will be automatically removed. Paragraphs may be separated using a newline (ENTER).

(optional)