Are benefits of ORM tools real?
We all know there are fundamental differences between relational and object model - called "object-relation impedance mismatch." If we are going to build relational object model on top of relational database (which is natural if using object-oriented language), we have to solve that difference somehow, and ORM tools are one of the options. But are the often mentioned benefits real?
Basically all projects I've been working on lately used a relational database to store data with domain model implemented in Java, and to overcome differences between relational and object model an ORM tool was used - Hibernate or iBatis. And I have to say sometimes it is not going according to promises of ORM developers ...
Let's see several frequently mentioned alleged benefits of ORM tools, and try to come to think of their validity in case of real project and a long-term outlook. In this article I'd like argue with these four benefits of ORM tools:
- speeding development
- significant reduction in code
- abstract and more portable code
- rich query language
The tricky part is all those things are actually valid - at the beginning of the project, when it allows fast implementation of a basic version of the project. Problems related to ORM usually begin in later phases of the project, when the code complexity reaches a certain level and when it's too late to change the decision to use ORM.
But even when the problems finally come, the developers (and management) fight tooth and nail to keep the ORM solution, as it's very difficult to quantify the benefits of the change, but the quantification of the costs is usually quite simple. (Try to ask your project manager "We'd like to rewrite the persistence layer - we'll spend 100 mandays working on that but we are unable to quantify the benefits," and you'll see his reaction.)
I strongly recommend an article The Vietnam Of Computer Science written by Ted Neward - it's not actually fresh (it was written in 2006), but it's still valid and nicely summarises the essence of the problem using an anlogy with Vietnam war. Unclear or unrealizable and sometimes even conflicting goals and vaguely formulated "exit conditions" - these are the reasons why USA failed in Vietnam and why so many projects suffer because of ORM.
And now let's see the benefits, or alleged benefits of ORM.
Speeding development
It's true that in the initial phases of development the ORM tools actually speed the development - by using metadata (defined using annotations or XML] to generate basic CRUD methods and mapping the query results to domain objects automatically.
The problem it works like this only at the beginning, but as the project grows and gets more and more complicated (and the persistence layer stabilizes) these benefits disappear. By contrary solving problems related to the ORM itself consumes more and more time - how often have you told yourself "If only we were not using Hibernate I'd solve this in 5 minutes?" And it may happen quite easily that solving these ORM-related problems consumes more time that the ORM saved at the beginning.
An example of such - often marginalized - practical problems is the difficulty of reloading changes in ORM configuration without a complete restart. After adding a field to a bean in XML (Hibernate) or changing an SQL query in iBatis, you have to redeploy the whole application as there is no other reliable way to reinitialize the ORM. At the beginning the application is small and the redeploy is almost immediate, but as if the application grows enough the redeploy may tak even minutes. And if you need to redeploy several times a day ...
Significant reduction in code
Another frequently mentioned benefit of ORM tools is a significant reduction in code, which is obviously closely connected to the previous paragraph.- and even in this case it's partially true. For example Hibernate can map results of queries to domain objects automatically, using a metadata (XML or annotations), which saves a great amount of code. The question is whether there are side effects of ORM leading to more code in other places.
Consider for example the "lazy load" feature (in Hibernate), allowing you to specify when and how to load properties (of entities) stored in the database. For example if one entity is related to other entity or collection of entities, should it be loaded immediately when loading the main entity or later when it is referenced for the first time? Hibernate does this using wrappers instead of collections and dynamic proxies instead of single properties (bytecode injection using CGLIB), and allows you to specify various parameters at various levels (single properties or the whole class) that influence the performance and scalability of the whole application - besides others there are lazy, fetch and batch-size parameters.
A very nice summary on this topic may be found in A Short Primer On Fetching Strategies, and somehow more detailed analysis of problems related to lazy-loading is in article Hibernate: Understanding Lazy Fetching.
Regarding problems related to lazy fetch - if I have learnt something about database performance over the years, it might be compiled into the following simple wisdom "Always load only the data you actually need and do that using the minimum number of queries" which may be broken into the two following rules:
- no unnecessary queries - if you need data about multiple related entities, do not perform separate selects but join the tables and load the data in one sweep
- no unnecessary data - do not load more data than you actually need
But achieving this is very difficult with lazy fetching, as the instructions which properties should be loaded lazy are encoded in metadata (annotations or XML).
I've seen projects with a "strongly lazy" domain model - which was quite fine for some more interactive parts of the application (frontend), but you can imagine what it meant for batch processing (which was by the way a significant part of the application).
I've met several bizarre solutions to this problem - e.g. defining multiple mappings for the same class (each time with a different lazy load settings) or performing "preliminary" pass through the data aiming to load all the necessary data. None of these will actually lead to reduction in code, right?
There is a single correct solution to this "lazy fetch problem" - carefully analyze all the use cases, determine what data you actually need and make sure only the necessary data (i.e. data used in the use case) are loaded. If the use case needs a significantly different amount of data than the other use cases, you have to define a new query (HQL in Hibernate or SQL in iBatis) with the proper lazy load settings. Yet another piece of code ...
Abstract and more portable code
Frankly speaking, I'm not quite sure what the "abstract code" actually means. When developing the application, you build the domain model step by step by defining object definitions - e.g. using Java, and a persistence model (a definition of relational schema). The goal of ORM is automatic mapping between these two models, but the definition is up to you - including the level of abstraction. Sure, ORM may help you in doing some bad design decisions - mixing business and persistence logic, but that can be easily achieved even without ORM and it is nothing to do with the level abstraction.
The improvement of portability is even more questionable. Considering a standard three-layer architecture (i.e. application based on a persistence, business and presentation layer), only the persistence and business layers are related to ORM (if there are connections between presentation layer and ORM, you have a much more significant problem than portability). In such cases "porting" may mean changing the database (in case of persistence layer) or programming language (business layer).
I dare say switching a programming language in case of large projects is not possible because of money - the costs are usually much higher than potential gainings of this change. And besides that, most ORM tools are designed for a single or maybe two programming languages, so switching a language would actually mean switching the ORM too (rewrite the config files etc.).
The possibility of switching a database is much more real - first there are attempts to consolidate databases used through the company, second there are projects aiming to support multiple databases at once (generally a software deployed at multiple clients). It's true that by rigorous usage of ORM the developer is shielded from the particular database implementation, but on the other side as the project matures you find more and more queries that are difficult to perform using Hibernate - e.g. queries used for reporting purposes (aggregated queries) etc.
In such areas a plain SQL executed "aside the ORM" is used, but each such SQL query poses a challenge for portability - whenever you change a database you may need to test and modify these SQL queries.
And just as well it's true that rigorous usage of ORM may cause elimination of advantages of the particular database. All database implementations support SQL (although with different dialects), but each implementation generally support some specialities - and you pay a lot of money for them.
And in case of ORM solutions based on plain SQL (e.g. iBatis) there's nothing like increased portability - you have to check and modify all the queries, one by one.
Rich query language
If the ORM tool is not based on a plain SQL (as for example iBatis), it has to support another way to enter queries. There are multiple ways to do that, but the most common and most flexible is a new query language (e.g. HQL v Hibernate).
But ORM tools with a specific query language may use just a subset of SQL common to all supported databases, which actually means that a lot of unique features of your database system - which make it different from other database systems - are wasted.
Yes, it's true that the already mentioned HQL makes possible quite comfort navigation through objects stored in a database and mapping of results to object model, but that does not mean HQL is a "ruch query language." If you consider arbitraty HQL query, it's usually very simple to rewrite it to SQL and map it to domain objects by a rather simple post-processing.
Conclusion
I'm sure that after reading this article I look like a hardened enemy of ORM following a rule "Throw away the ORM and write everything by hand," but that's not true. This article is aimed at difficulties and false benefits related to ORM tools - that's why the article seems so negative.
Truth is nothing is perfect - including ORM tools, and what looks great at the beginning of a project may be clearly contraproductive at later stages. Constantly check that the chosen ORM solution is an actual benefit for the project, and once it begins to drag the project don't hesitate and switch to something better. How difficult this switch will be depends solely on you and how you can separate business and persistence logic.
Links
- Object-relational mapping (Wikipedia)
- Object-relational impedance mismatch (Wikipedia)
- The Vietnam of Computer Science (Ted Neward)
- Why Should You Use an ORM? (Keeping it Simple)
- Wrecking Your Database (Josh Berkus)
- Why I Hate ORMs (a solicited rant) (Doug Boude)
- To Hibernate or not? - A commentary on ORM’s and few recommendations (Simple Thoughts)
- Pros and Cons of an ORM (DataFaucet ORM Blog)
- A Short Primer On Fetching Strategies (JBoss Community)
- Hibernate: Understanding Lazy Fetching (Javalobby)




