Cocoa: Rewrites

System rewrites must be extraordinarily popular because I've worked on my fair share of them throughout my career. And usually the story is the same...

Poorly-trained developers in an IT department or startup slowly build a system over the course of a few years and then eventually come to the realization that the system needs a rewrite. Here are some of the most common reasons:

The database design is bad (Usually it's atrocious. It's very rare to find good data models)
The code for the business logic or UI is a hopeless tangle of procedural code (You'd be lucky if thy UI and business logic are separate. I've even seen one implementation of a web application where data access, business logic, and even HTML markup was generated by Oracle stored procedures.)
Performance issues have cropped up (Most applications are bound by the performance of the data access code. Good data model design is crucial in this respect, but bad queries (easier to make with object relational tools btw) are the usual culrits and are a good place to start.)
The technology is obsolete. (Unfortunately this is a fact of life in IT and it is common for this old code to be the only "documentation". Keep the developers of the future in mind when writing your code and make it readable.)
It's just so fragile that implementing new functionality takes way too long. (If you see gigantic nested if-then statements, run away now before you run away screaming later.)
The new IT manager doesn't like his/her predecessor's legacy (It's sad but true. The "wiping out all vestiges of that 'bad' manager who was the harbinger of all things evil" is a bit of a spectator sport among developers.)
It's a combination of some of the above. (And if you've really arrived at the precipice and are gaping into the abyss it'll be all of the above.)

Eventually the decision to rewrite is approved and someone must make the decision about how to redevelop the system. People usually see these two paths:

They can redevelop the entire thing, affectionately dubbed the "big-bang approach"
They can incrementally replace the subsystems one at a time.

Usually the incremental approach is the most attractive because it's seen as having less risk. And of course managers who like their jobs like less risk. But this "lower-risk" decision usually comes with one very difficult problem: "How do you integrate the new code with the old and keep them both running at the same time?" Well the typical decision is to just leave the database alone and build the new application on top of it. While this seems to be an easy choice (look at all the stuff you can reuse and no data migration activities to endure), this is one of the worst things you can do.

No matter how good the database may appear at first glance and no matter how much faith you have in your object relational mapping (ORM) tool's ability to "abstract away the ugly parts", you are always going to compromise your object model. It is extraordinarily difficult to design the best domain model you can without repeating the design decisions made in the database. It's like trying to hum a tune while listening to another song on the radio, you inevitably end up singing the song on the radio.

For several years I believed that ORM really gave me an abstraction from the physical database, that the two were independent and it was meta data that glued them together. But after some real experience, I learned the hard way that was a fantasy. If you want to be productive and want to work with an object oriented domain model that you can easily refactor you really need to be able to change the data model just as easily. I now consider the domain model and the relational data model to be two perspectives on the same thing. If you don't believe me, I ask you to consider the following trends:

In EJB3, the relational view of the domain model is becoming part of the domain model through Java 5 annotations. The main reason for this is practicality. It embraces the DRY principle (Don't Repeat Yourself): every piece of knowledge should have a single, authoratitive, and unambiguous representation in a system. It's an acknowledgment that the attribute oriented programming model advocated by XDoclet so many years ago is a good one. I suggest that you should even generate your database from this metadata, keep it DRY.
In Ruby on Rails, much of the domain model is generated at runtime by querying the database meta data. The database can't express everything in a object model so the rest of the picture exists in the Ruby domain classes. This practical approach that intrinsically marries relational and object views is one of the many features of Ruby on Rails that make its supporters claim such high productivity benefits.

You may also want to consider what data warehousing people have been doing for years. They always treat the transactional databases as sources of data but they always stage that data somewhere else and "cleanse" it before making it part of their warehouse. So while they're dependent on that data model for ongoing feeds they have a real abstraction from it and none of their cubes or reports are impacted by a change in the data source.

Rewrites appear to be inevitable. Even if the reasons for doing it are dubious you'll probably find yourself in one of these projects sooner or later. But if you going to get involved, make sure it's a complete rewrite. Don't take shortcuts. The most successful rewrite project I worked on created a new domain model and new data model for the new application and used database technologies to keep the new and old data model synchronized. We got the productivity of not being bound by the old (and very scary database) and we kept everything else that was dependent on that old database running by keeping the old database alive. The most aggravating projects have been the ones that tie you down by saying "leave the database alone". Then there are the ones that say "leave the UI alone" too. Ugh! But that can be the subject of another article...

Cocoa

Sunday, January 15, 2006

Rewrites

No comments: