Embrace Your [HS]QL

I'd like to start with a confession.  I love SQL (and it's Hibernate counterpart HQL).  Its simplicity and elegance is remarkable.  Its expressive power is stunning.  It is almost universally disliked by developers.  Why is that?

The answer, at least to me, is pretty straightforward.  Developers spend a lot of their time programming, so that's where their strengths lie.  Most programming is done with general-purpose languages (GPLs – not to be confused with the Gnu Public License), which are generic, Turing-complete environments.  They a real-world-like typing system (numbers, objects, lists, etc.), and often mutable data structures.  Developers get very good at using such languages to get stuff done.  This isn't a revelation.

SQL, on the other hand, is a domain-specific language (a DSL).  It is not Turing-complete, uses set-based data structures (which are immutable – more on this later), and just generally does stuff in a totally different way than a GPL.  This can create friction for developers, because they have to change mindsets when shifting between their general programming tasks (with a GPL) and their querying tasks (with SQL).

This bit of friction, however, is a source of great power if you're willing to invest a little in the SQL way of doing things.

The Bad Stuff

I'm not going to deny that writing out big-ass INSERT and UPDATE statements is a bear.  CRUD in general – SQL or otherwise – is simply an unpleasant task that should be avoided at all costs.  Fortunately there exist myriad tools and frameworks to alleviate with this problem (Hibernate being a shining example).  For the purposes of this discussion I'm only going to be talking about data recall (i.e., SELECT statements).  The C_UD part of CRUD is a different discussion, and hopefully you're using some "magic" way of dealing with that stuff.

The thing that is interesting to note, however, is that all the toolkits that I've seen to isolate you from the C_UD part of CRUD still provide a query language for the _R__ part.  So clearly a query language (SQL or otherwise) is an important part of the way things work, even if the whole point of a toolkit is to isolate you from SQL itself.

The Good Stuff

First, the "S" in "SQL" stands for "Structured".  Queries using SQL should be structured into little atomic pieces (subqueries) that are easily understandable and then assembled in to larger structures (with no semantic change to the composed bits).  Think of SQL as a box of LEGOs where you build up larger and larger structure from incredibly simple base components.  Here's an example of what I mean, working through it first with HQL, and then with SQL.

HQL

Consider this HQL example (from the cf-orm-dev mailing list):

select distinct p
from Post p
  left join p.categories c
where c.name like 'top%'

This selects the distinct Posts that have a category which starts with "top".  Straightforward.  But not really a very good use of the structured nature of HQL, because we're really asking two nested questions.  The first is the obvious – which Posts – but the second is more subtle: which Categories start with "top".  Consider this version:

from Post p
where exists (
  from p.categories
  where name like 'top%'
)

This is better; now we have a representation of both questions we're asking, as well as the "nested" relationship between them.  But this is still a little wonky because we're doing a dependent subquery when the nested question (which Categories start with "top") is really completely isolated from any association with Posts.  So consider this one:

from Post p
where any elements(p.categories) in (
    from Category
    where name like 'top%'
  )

Now we have the inner question totally isolated from the outer question, and the association between the two questions is expressed via the "any elements in" clause.  Very expressive, very modular, very structured.  And unfortunately, very unsupported by RDBMSs.  This is a valid query construct, but I don't know of any databases that actually support this ANY..IN construct (I tried MS SQL, MySQL, and Postgres).  So you can't actually use it.  Of course, a better HQL and/or SQL preprocessor could transform it, but they don't do that either.  In my opinion this is really Hibernate's problem to fix; it should generate different SQL for this construct.

Unfortunately, there's an … uh, "un-silver" … lining.  In this particular case, most DBs will execute the JOIN more quickly than the subquery, because the subquery is correlated (i.e. must be executed once per row in the outer query).  Aside from the semantic separation, that's another reason that the third (unsupported) query is better.  Since the subquery in the third rev is not correlated it only needs to be run once and the same result can be used to filter all the Post rows.

SQL

Here's the SQL equivalent of the first HQL query:

select distinct p.*
from Post p
  left join link_post_category l on p.id = l.post_id
  left join Category c on l.category_id = c.id
where c.name like 'top%'

This illustrates an interesting fact that the HQL doesn't show as clearly: in order to do the DISTINCT, the SQL engine must do a comparason on every column of every row.  If you're pulling back a few BLOB or CLOB fields this can be prohibitively expensive.  But if your column width is small, this is still fast because it suits the optimizer.  Now here's the correlated subquery version:

select *
from Post p
where exists (
    select *
    from link_post_category l
      left join Category c on l.category_id = c.id
    where l.post_id = p.id
      and c.name like 'top%'
  )

Now we've got rid of that "every column of every row" problem, but replaced it with a correlated subquery.  So lets ditch that next:

select *
from Post p
where id in (
    select l.post_id
    from link_post_category l
      left join Category c on l.category_id = c.id
    where c.name like 'top%'
  )

Now the subquery is pulling back all the post_ids we want to retrieve (in one step), and the outer query is just doing a primary key lookup to grab those rows.  This is going to be crazy fast, but we still have that JOIN in there which makes the subquery kind of nasty.  One more revision:

select *
from Post p
where id in (
    select post_id
    from link_post_category
    where category_id in (
        select id
        from Category
        where name like 'top%'
      )
  )

With a pair of nested subqueries we're down to that LEGO-like ideal.  Starting at the innermost piece, we pull all the Categories that start with "top".  Then we find all the post_ids that are assigned those Categories.  Finally we pull all the Posts with those post_ids.  With MS SQL's query optimizer this will have the same performance characteristics as the prior version (i.e., crazy fast), but MySQL will unfortunately struggle a little bit. online casino Malaysia

Remember that final HQL query that won't actually execute on any DB I found?  Here's the SQL for it:

select *
from Post p
where any (
    select category_id
    from link_post_category
    where post_id = p.id
  ) in (
    select id
    from Category
    where name like 'top%'
  )

This is roughly equivalent to the second SQL version, but since it won't run, it's academic.

The SQL Way

If you're going to use SQL effectively, you have to think about SQL in it's terms, it's domain.  As a DSL, it's not designed to be a GPL.  It's designed to be used from a set theory perspective and expressed in a very structured fashion.  If you try to use it like a GPL you'll be able to get the job done, but you'll also get frustrated and build some really nasty queries that will be hard to maintain/reuse.  And then you'll hate SQL, and then you'll try to avoid it (and other query languages) at all cost, and really be missing out.

As an example, I challenge you to dig into the Hibernate Criteria API, which is a programmatic way of querying (i.e., an API for a GPL).  Try and create a Criteria query that expresses this question "what is the average age of males with either zero or two or more children" in a way that is even close to readable and concise.  Here is the HQL version (which is both readable and concise):

select avg(age)
from Person
where sex = 'male'
  and (children.size >= 2
    or children.size = 0
  )

As way of a hint, you'll need a few Restrictions instances along with a Projections instance, plus some glue bits.

Immutability

I mentioned the immutability of SQL data structures way back at the beginning, and now I'll follow through.  Obviously state can be changed via UPDATE, INSERT, and DELETE statements, but that's not what I mean.  When you're running a query you have the ability to create temporary data structures (typically subquery results).  They are immutable.  Even if you're using a "programming" SQL extension (e.g., T-SQL or PL/SQL), state is still typically immutable.  You can change variable references, but you can't go changing the objects they reference.

For example, if you have a rowset and want to add a new row, how do you do it?  You don't add a new row, you select the UNION of the current rowset and a new (very likely ad hoc) rowset containing the row to add.  This is a very different paradigm than most GPLs use, where state is typically mutable, at least for reference types.  Consider a Java List.  Calling .add(o) doesn't create a new List with the new item added to it, it just adds it to the existing list.

Functional languages are an exception, as many (most?) embrace the concept of immutable state (or zero side effect) which removes any need to consider concurrency, allows programs' behaviour to be "proven", etc.  In a zero side effect environment, calling .add(o) would create a new List with the new item added and return the new list (the original list would be discarded).  It's definitely a different approach to doing things from what most people are used to, but provides some incredibly powerful capabilities.

Stop Already

Ok, ok.  That ended up being about ten times longer than I intended.  I'm sorry.  But if you got this far you've hopefully grown a bit of an appreciation for SQL, or at least the knowledge that there might be something there to appreciate.  And do keep in mind that the examples I've given are all really simple.  The power of the "Structured" is sort of nice with queries just a few lines long, but it really shines when you're dealing with queries that stretch over hundreds of lines, especially when they're dynamically constructed.  That incredibly granular breakdown of functionality allows you to reuse bits and pieces very effectively, and the zero side effect nature means that you are guaranteed not to have some random side effect ruin your day.

Should It Just Work?

Last Wednesday evening Sean Corfield and I were chatting over beers and he caught me by surprise with a statement he made.  We were talking about the ORM functionality in CF9, and he liked how it just worked.  You invoke a setter, and it gets to the database with nothing else.  I agree in theory, but not in practice.

It had honestly never occurred to me that someone would actually want to have stuff get to the database without any control over it.  Transactionality is such an ingrained part of my being that the mere thought of writing to the database outside an explicit transaction never even entered my mind.

There have been a number of discussions about transactions on the cf-orm-dev mailing list.  Often prompted by someone mentioning some problem and me jumping down their throats about how they better use transactions.  But here's a very experienced developer who I know full well intimately understands transactionality actually saying with complete honesty that he likes to have it just work.

I'm personally against this approach: if something goes awry your database could well get jacked, and no one wants that.  Though I nearly always apply transactionals via AOP, and both Sean and I agreed that assuming AOP familiarity is unreasonable for people using CF9 ORM.

The conversation quickly moved on from there (I believe Scala was the next topic), but that brief moment of complete surprise stuck in my mind.  I certainly don't claim to know it all, but I know that I have opinions that are not shared by everyone.  However, the dismissal of ACID like that was a shock.  I respect Sean's opinion very much, and while I will continue to wrap all my saves in a transaction, things are clearly not as cut and dried as I think they are.

Polyglot Programming at cf.objective()

This afternoon I presented on Polyglot Programming at cf.objective() 2010.  Unlike most presentations I give, this one has almost no code, so the slidedeck (as a PDF) is the whole shebang.  The in-deck content is admittedly light; really just an outline to follow along as I talked.  The short version of the verbal part is:

Using multiple languages has a bit of a learning curve but it pays off, and more quickly than you think.  Language selection and design is a vital aspect to being a successful developer, both for individual projects and as part of your continuing career.

I'll probably give the presentation again on CFMeetup at some point this year, and maybe at a user group or two, so if you missed it all is not lost.  Unfortunately (or fortunately), cf.objective() is so content-rich that it's hard to get to every session you want, so if you missed it you can probably get a second chance.

Domain Model Integrity

Unlike my last several posts, this one isn't ORM related.  At least not directly.  If you're using ORM, you necessarily care about your domain model's integrity, as it's a prerequisite for ORM doing it's job, but it has nothing to do with ORM specifically.  The point of a domain model is to be a representation of your business rules and logic, and that means it needs to be internally consistent.

If you're building a SQL-heavy, procedural application, the database is probably the only place your domain model is represented.  But if you're building an object oriented application, your domain model will also be represented in memory as object graphs.  In almost all cases, a given object graph is only a small slice of your entire domain model, but it is a representation and must be kept consistent.

Here's an example of a very simple domain model consisting of Person and Pet classes, where a Pet has an owner (a Person), and a Person has a collection of Pets:

component Person {
  property name="name" type="string";
  property name="pets" type="array[Pet]";
}

component Pet {
  property name="species" type="string";
  property name="owner" type="Person";
}

Just to reiterate, these are NOT persistent types.  They're simple types for in-memory use only.

So what semantics does this model imply?  Or to rephrase, what invariants does this model carry?  The most important semantic is that the relationship between pets and their owners is expressed from both sides (both classes).  More explicitly stated, the domain model is structured such that if you have a Person you can get their Pets, and if you have a Pet, you can get their owner (a Person).  The implications of this is expressed in these two invariants:

assert pet.owner is null || pet.owner.pets contains pet
assert person.pets.every { pet.owner == person }

The first one states that if a Pet has an owner, that owner's "pets" collection must contain it.  The second one states that every Pet in a Person's "pets" must have that Person set as it's owner.  I'm trying to be really deliberate in spelling this out, because it's really important.

Just for a moment, let's take a detour to the relational database world.  If we were to express this domain model in the database we'd have a Person table and a Pet table, and the Pet table would have a foreign key (likely named 'owner_id') that references the Person table's primary key.  SQL allows us to traverse the relationship expressed by that foreign key in either direction, so both relationships (Person->Pet and Pet->Person) are expressed in a single place (the foreign key column).  Both directions are represented together.  A foreign key constraint (which all RDBMSes support), on the column is doing nothing more than instructing the database to enforce these invariants.  This is all second nature, and we don't even think about it when we use a database to represent our domain model.

Now back to the in-memory representation.  We still need to enforce these invariants, but in memory we have to deal with references (pointers), and references only point in one direction.  That's why we have to have both the 'pets' property (a Person's references to Pets) and the 'owner' property (a Pet's reference to a Person), but in the database we only need one foreign key column (Pet.owner_id).  The relationship between Person and Pet objects is actually expressed in a pair of references.

The problem with this arrangement is that you, in effect, double represent your relationships.  Both invariants must remain true, and since each invariant is represented by it's own reference in the model, you have to synchronize changes to those references.   When you set the owner of a Pet, you must also add that Pet to the owner's "pets" collection.  When you remove a Pet from a Person's "pets" collection, you must also remove the Pet's owner reference.  If you don't keep these in sync, one of your invariants will be false, and that means your domain model is in an invalid/inconsistent state.

When your domain model is in an invalid state, your application falls apart.  Every assumption you make in your application is suddenly unreliable, because they're predicated on your business rules, and your business rules are expressed through your domain model.  An invalid domain model means your business rules were violated, and anything you do from this point forward will be suspect.  I'm going to say it again: this is really important.  If your domain model is in an invalid state, your application has failed.  Period.  End of story.  Your only recourse is to revert it back to it's last known consistent state and throw away all pending operations.

What about relationships that are not bi-directional?  For example, perhaps your model has PrivateEye and Subject types.  Clearly the PrivateEye needs to know about his Subject, but it'd be kind of silly if the Subject knew about the PrivateEye.  In this case the relationship only moves one way, so there is only one reference (from PrivateEye->Subject), and there is no invariant.  When we put this in the database, however, we have exactly the same structure as the bi-directional Person<->Pet relationship: a foreign key that can be traversed in two directions.  With the database representation of the model you can't express the concept of a one-directional relationship.  This is a powerful differentiator for in-memory models, since it gives you much finer control over the semantics of your model than a database could ever provide.

So where does this relate to ORM?  Just like everything else in your application, Hibernate depends on your invariants being true in order to persist your model to the database.  If they're not true, Hibernate can't hope to do it's job correctly.  A huge number of "problems" that people starting out with Hibernate face have nothing to do with Hibernate itself, but rather are caused by an invalid in-memory domain model.  Coming from the world of procedural, SQL-based persistence, you don't necessarily have to worry about an in-memory domain model's integrity, which means that you can write what amount to buggy applications where the bugs never manifest themselves.

Bottom line is that if you're using an in-memory domain model, you simply must ensure the invariants of that model remain true.  More specifically, you must set both sides of every bi-directional relationship.  If you don't, you're just asking for punishment, both from your software tooling and from users/clients of your application.

Want Multiple Persistence Contexts in CF9 ORM?

I do.  Because god knows front controllers and persistence layers don't have a one-to-one correspondence.  Turns out that through the magic of application switching you can do it, as long as you're careful with your sessions.  Not going to go into details right now, but this code works (and does what you'd expect):

<cfapplication name="ormtest1" />
<cfset companies = ormExecuteQuery('from Company') />
<cfinclude template="../app1/dsp_home.cfm" />

<cfset ormCloseSession() />

<cfapplication name="ormtest2" />
<cfset people = ormExecuteQuery('from Person') />
<cfinclude template="../app2/dsp_home.cfm" />

Here's the Application.cfc that this snippet runs under:

component {
 this.name = "ormtest3";
}

No ORM awareness at all.

The primary gotcha is that the other applications (ormtest1 and ormtest2) have to have been previously initialized (which you can do with a simple HTTP GET), but this allows you to mix multiple persistence contexts together (much like you can with normal Hibernate), and beat the stupid Application.cfc binding to some degree.  You still can only exist in a single context at any given moment, but you can switch back and forth any place you can close/reopen a session.

And for those of you who are thinking ahead, no, you can't manually manage your session objects instead of letting CF.  You can get a cross-application reference to the SessionFactory without much issue, and you can create a session, and you can even query for entities, but when you get to traversing lazy-loaded relationships it bombs out for some reason.  CF needs to have control over the active session or it doesn't work.  Rather sad, but at least running multi-persistence-context apps is possible, even if not concurrently.

Don't Forget Inverse="true"!

When you're mapping relationships with Hibernate, you have to reconcile the uni-directional nature of Java references with the bi-directional nature of RDBMS foreign keys.  Consider this simple database schema with a single relationship:

              +----------+
+--------+    | PET      |
+ PERSON |    +----------+
+--------+    | id       |
| id     |----| owner_id |
| name   |    | species  |
+--------+    +----------+

That works all well and good, and we can query across the relationship in either direction:

-- get pets
select pet.*
from pet
where ownerid = 456
-- get owner
select person.*
from person
where id = (
    select ownerId
    from pet
    where petId = 123
  )

When you get into the world of objects, however, the interplay between Person and Pet is a little different because we actually have two associations (Person.getPets() and Pet.getOwner()).  With the SQL model there are two tables, one key, and two queries.  WIth the object model there are two objects and two properties.  So when mapping these two entities (at least with CF9 ORM) you'll use code something like this:

component Person {
  property name="id" fieldType="id" generator="native";
  property name="name" type="string" unique="true";

  property name="pets" fieldType="one-to-many" cfc="Pet";
}

component Pet {
  property name="id" fieldType="id" generator="native";
  property name="species" type="string" unique="true";

  property name="owner" fieldType="many-to-one" cfc="Person" fkColumn="owner_id";
}

And then, if I were to have a cat, you might want to run some code like this:

p = new Person();
p.setName("Barney");
c = new Pet();
c.setSpecies("cat");
c.setOwner(p);
p.addPet(c);

In particular note that two references are created between the objects: c.setOwner(p) creates a  c -> p reference, and then p.addPet(c) creates a p -> c reference.  If you don't create both references, you're going to get weird behaviour because either I'll believe I own a cat while the cat believes it has no owner, or I'll believe I'm petless while the cat thinks it belongs to me.  Either of those situations is invalid, at least from the perspective of this contrived example.

That double reference causes an issue, however, when Hibernate comes along and tries to persist the object graph to the relational database structures, because both the Person and the Pet claim ownership over the in-DB relationship between them.  So you'll get an invalid series of SQL statements as Hibernate tries to persist the relationship twice, once from each end.

Fortunately, the solution is trivially simple: inverse="true".  By adding that attribute to one of the related properties you're telling Hibernate that it's the "inverse" of some other already-mapped property.  So when it comes to persistence operations, Hibernate should ignore the references because they're the inverse of some other already-existing references.  I know that girl porn stars are not all that interested in what I'm doing. I know it is a little bit embarrassing on the whole but I like it and I love it. Sometimes I like to touch myself and get off on it. I have a little bit of a problem with my own face. This is the other reason why it's vitally important to set up both references when you're creating objects: not only will you get the invalid state discussed above, depending on how you've set up inverse relationships you might not get the transitive persistence you expected.

So the corrected entity code from above looks like this:

component Person {
  property name="id" fieldType="id" generator="native";
  property name="name" type="string" unique="true";

  property name="pets" fieldType="one-to-many" cfc="Pet" inverse="true";
}

component Pet {
  property name="id" fieldType="id" generator="native";
  property name="species" type="string" unique="true";

  property name="owner" fieldType="many-to-one" cfc="Person" fkColumn="owner_id";
}

It's worth mentioning that if you don't know about inverse, the "correct" solution might appear to be only creating one reference.  Yes, that'll let you get your entities into the database, and then when Hibernate pulls them back out for you on the next request the bi-directionality of the relationship will be created.  But you'll run into problems with certain types of update operations, so it's not really a solution.  You need to use inverse="true".

This applies to all bi-directional relationships, not just one-to-many/many-to-one.  A two-way one-to-one or many-to-many carries the same requirments.  The symptoms are a little different, however, if you forget.  In particular, with many-to-many you won't get an error.  Instead, you'll just get duplicate rows in your link table.  This can lead to really weird duplicate membership problems in collections, and well a troubles with deleting and/or updating relationships, but again, they're not going to cause hard fails.

In general, however, real many-to-many relationships are rare.  In the majority of cases, the "middle" of that relationship really ought to be an entity in it's own right, splitting the many-to-many into two one-to-many relationships.  For example, people are often on many mailing lists , so you might think Person-MailingList would be a perfect fit for a many-to-many.  But really there is a Subscription object in the middle that links the Person to the MailingList and has info like createDate and stuff.  So we're back to a pair of one-to-many relationships.  I'm not saying that many-to-many doesn't exist in the real world, just that if you find one, you should step back for a second and see if you're actually missing an entity in the middle.

But in any case, the point is that when you're using ORM, you have to be really careful about your semantics, and inverse="true" (and it's not-always-needed compatriot mappedby="prop") is an easy one to overlook.

I'm WPTouch Enabled

Over lunch I installed the WPTouch plugin to provide a iPhone OS-like interface for those of you who want to read site on your phone but not with the full layout.  Mmobile devices will still see the normal layout by default; you must enable the new interface using the toggle in the footer.  Once toggled, subsequent visits will continue to use the mobile interface.  The plugin works by intercepting requests and completely subverting the configured theme and replacing it with a mobile-specific layout.

When I did the real layout for the site I considered mobile devices, picking a font size and column width that I thought reasonable to read in landscape mode on an iPhone-sized screen.  As well as meaning less work and providing a more consistent interface, PRE-formatted code renders without being sliced off to the right, which is a big win over the mobile interface.

No question, neither one is perfect, but the choice is now yours.

Why ORM Transactions are Crucial

Continuing the transactions thread, I wanted to talk about a couple specific reasons that transactions are vitally important in ORM applications.  Even more than in raw-SQL applications.  I consider them mandatory anywhere, but I would argue that if you feel compelled to skip them for some reason, you'd be better off leaving them out of your SQL applications than your ORM applications.

But before I dive into the specifics, I wanted to point out a rather fundamental issue I see with the CFML community: transactions are an afterthought.  I've been using CFML for over a decade and the laissez faire approach to transactions never ceases to amaze me.  Not individual developers, mind you, I'm talking about CFML developers as a group.  If you look at the Spring documentation, for example, the first topic under "Data Access" is transactions.  Not DAO support, not JDBC, not ORM.  Transactions.  And that is absolutely the correct order.  We as a community need to get this one sorted.  It'll result in better CFML applications, richer CFML toolkits (like native transaction support in ColdSpring and others), improve the public opinion of CFML developers, and just generally be better all around.

Now back to transactions and ORM…

Reason #1

Let's assume for the moment that you're processing the submission of an account creation form (name, username, password) using CF9 ORM.  Here's some code you might use (minus any architecture and validation):

a = new com.site.Account();
a.setName(attributes.name);
a.setUsername(attributes.username);
a.setPassword(attributes.password);
entitySave(a);

This is pretty straightforward: create an Account instance, set the properties, and all entitySave().  So where does the actual INSERT statement happen?

No one knows!

It's possible that it'll happen at the entitySave() call (depending on how you're doing unique identifiers).  It's possible it'll happen at the end of the request (depending on how "this.ormsettings.flushAtRequestEnd" is set in Application.cfc).  It's possible it will never happen at all and the submitted data will be lost.

Egads!  Really?

Yes, really.

If we simply wrap that code with a transaction block, all our problems go away:

transaction {
  a = new com.site.Account();
  a.setName(attributes.name);
  a.setUsername(attributes.username);
  a.setPassword(attributes.password);
  entitySave(a);
}

Now we can state with complete certainty that by the time the transaction block closes not only will the INSERT have executed, but the new record is assured to be safely persisted within the database.  Thank goodness!

It's not all roses though, because using transaction blocks like this can screw with your sessions (ORM sessions, not the session scope).  Adobe has fixed the issue, but the initial CF9 bits are broken.  So you really want to use Hibernate transactions instead of CFML ones to avoid the issue.  If you use ColdSpring, I have a TransactionAdvice specifically for addressing this problem.  If not, you just need to write the above code like this:

txn = ormGetSession().beginTransaction();
try {
  a = new com.site.Account();
  a.setName(attributes.name);
  a.setUsername(attributes.username);
  a.setPassword(attributes.password);
  entitySave(a);
  txn.commit();
} catch (e) {
  txn.rollback();
  throw e;
}

It's a couple extra lines of code, but it'll save your sanity – particularly on transactions that return a result which you want to manipulate later.  And really, you should just use ColdSpring to manage your service layer and use AOP to do it all magically.  :)

Reason #2

CF9 shipped with "this.ormsettings.flushAtRequestEnd = true" as the default, which means that side effects can trickle into your database without you being aware of them.  The first thing you should do in any CF9 ORM application is set flushAtRequestEnd to false.  In fact, stop reading and go do that right now for any existing applications you have as well.  I'll wait.

The primary reason you want this turned off is that you don't want changes being made to your database inadvertently.  If you're flushing at request end, innocently calling some setter somewhere (maybe as part of validation) will change your database.  Even worse, the change won't make it into the database until after the current request is complete, so you won't be aware of it until the next request at the earliest.  More likely, however, you won't notice in development and then your customer will call a week or two after production deployment and ask why all of the records have invalid values in some column or another.  Oops.

If you ignored my advice a couple paragraphs back, I beg of you, please go set flushAtRequestEnd to false.  Right now.

By disabling flushAtRequestEnd, changes to persistent entities will no longer make it to the database without your intervention.  Believe me when I say that this is the way it should be.  You, as the application developer, need to be in charge of when and what stuff is made persistent.  ORM is about removing the busy work of persistence, it's not about making you unaware of persistence.  There are two ways you can do it:

  1. ormFlush() – this will do a non-transactional flush of every modification in the current ORM session.  This is what flushAtRequestEnd invokes at the end of the request.  At least before you went and disabled it everywhere, right?  Right?
  2. transactions – allows wrapping of specific changes in a transactional context so the stuff you want get saved when you want.  Plus all the "normal" benefits.

Even if you're doing transactions manually, the work required on your part is pretty much identical for the two options, so can get all the benefits of transactionality for "free" by choosing that one.  And if you go with transactions, you can probably make things even easier by using AOP to apply them by default, basically freeing you from doing any work at all.

The bottom line is that transactions need to be a carefully considered part of any and every application.  And if you're using ORM (which is state based) instead of SQL (which is statement based), then it's even more important.

ColdFusion vs. CFML

<rant>

Every time TIOBE comes out with new listings people bitch about ColdFusion.  It bugs the shit out of me.  Here's the deal folks:

ColdFusion is a server product.  It is NOT a language.  CFML is a language.  You don't write ColdFusion, you write CFML and then you run your CFML on ColdFusion (or Railo, or OBD).  ColdFusion (or Railo, OBD, etc.) provide a framework for your CFML to run in, along with various bits of helpful functionality.  You might consider yourself a ColdFusion developer, because while ColdFusion isn't the language you use, it is the framework that you use.  You're a ColdFusion developer who writes CFML.

This is exactly like saying you're a JEE developer.  People don't write JEE code, they write Java/Groovy/etc. code, but they're still JEE developers.  JEE provides a pile of nifty features (Servlets, JNDI, JavaMail, XA, etc.) that you can leverage if you're running your code in a JEE server.  And different servers provide different sets of functionality (though all at least meeting the base spec).  So you write your Java (or whatever), and run it in a JEE server.  You're a JEE developer who writes Java.

This is also exactly like saying you're a Rails developer.  People don't write Rails code, they write Ruby code, but they're still Rails developers.  Railos provides a framework for your Ruby to run in, along with a pile of helpful functionality.  You're a Rails developer who writes Ruby.

I am a ColdFusion developer.  I am also Railo developer.  I'm also a JEE developer.

I write CFML, Groovy, Java, JavaScript, and a pile of other languages.

So if you want TIOBE to list ColdFusion on there, then they should also list JEE and Rails on there.  They don't, they won't, and they shouldn't.  Period.  End of story.

</rant>

I've intentionally not linked to the TIOBE listings.  They're easy to find if you care, but they're pretty irrelevant.  Bordering on retardedly stupid.

Transaction Demarcation

Transactions are important, and I wanted to talk about them a little, particularly as they relate to ORM.  If you're on the cf-orm-dev mailing list you've undoubtedly seen this topic come up over and over again.  And probably started muting conversations when my name shows up.  But before digging into the details, we need to clear up a couple misconceptions, starting with no ORM context at all…

First, transactions are controlled in the business layer, even though they are typically implemented in the database.  What do I mean by this?  When you begin/commit/rollback a transaction, you're actually issuing commands to your database server.  It does the work for you.  But the transaction demarcation is controlled by the business logic of the application.  Only your business logic knows what series of SQL statements comprise a unit of work.  With client-server apps (including web apps) there is often a one-to-one correspondence between requests and transactions, but not always.

Consider a simple account signup form like we've all filled out a million times.  It looks like that over there to the right.  There is the core account information, and then there's the checkbox for subscribing to the mailing list.  So what is the unit(s) of work?  There are [at least] two: creating an account and signing an account up for updates.  This means that submitting this form should trigger a request with two transactions (one for each unit of work).

In plain English, when processing this form the account creation should be successful regardless of whether the subscription is successful.  Far more important to have the account set up and ready for use, since you can just reprompt the user about the subscription if the subscribe action fails.

If you were to implement this with simple SQL statments, your code might look something like this (I've omitted all architecture, validation and security – you shouldn't do that):

transaction {
  executeQuery('
    insert into account
      (name, username, password,
       email
      )
    values
      ($params.name, $params.username, $passhash,
       $params.email
      )
  ');
  accountId = executeQuery('
    <magic get-account-id query>
  ');
}
transaction {
  executeQuery('
    insert into subscription
      (accountId, type
      )
    values
      ($accountId, 'inside scoop'
      )
  ');
}

This is a (the?) fundamental concept of transactions.  Let me state that again: transaction demarcation is business logic.  It belongs in your business layer.  Not in your persistence layer and certainly not in your presentation layer.  Doing this demarcation can be a gnarly problem, but you're already doing the hard part: identifying the units of work.  If you're using an object backend then those backend business methods are very likely your units of work.  Now it's just a matter of applying transactionality.

The easiest way that I've found to address transaction demarcation is through Aspect Oriented Programming (AOP).  That's a big scary term that simply means "running code when stuff happens, without having to explicitly call it."  For example, I want to run transaction demarcation code whenever a business method is invoked.  Not that complicated  (As an aide, if you're using ColdSpring, my TransactionAdvice CFC will help you.)

AOP gives you transparent transactionality which matches your business rules (assuming you have business methods that correspond to units of work) which is a huge win.  It makes transactionality simply disappear.  You don't have to think about it, you don't have to worry about it's various problems, everything just works.  Using the signup example (with a bit of architecture, though still no validation or security), your UI controller might look like this:

acctId = request.beanFactory.getBean("accountService).createAccount(
  attributes.name,
  attributes.username,
  attributes.password,
  attributes.email
);
if (attributes.signUp) {
  request.beanFactory.getBean("subscriptionService").subscribe(
    acctId,
    'inside scoop'
  );
}

The implementation of 'createAccount' might look like this:

function createAccount(name, username, password, email) {
  executeQuery('
    insert into account
      (name, username, password,
       email
      )
    values
      ($params.name, $params.username, $passhash,
       $params.email
      )
  ');
  var accountId = executeQuery('
    <magic get-account-id query>
  ');
  return accountId;
}

Note that there is no transaction demarcation anywhere.  Instead the beanFactory object is going to use AOP to ensure that every method invoked on accountService or subscriptionService will execute in a transactional context.  Transparent transactionality is achieved.

But don't think that this is an inflexible solution.  AOP provides ways to exclude methods, so if you have some method that shouldn't be transactional you can just exclude it and then do manual demarcation.  So it's really the best of both worlds: totally transparent when you don't care, but with the ability to take back complete control when you do care.

The bottom line is that if you care about the correctness of your applications and the integrity of your clients' data, then transactions are required.  But that doesn't have to be a burden on you as a developer, since transaction demarcation can be made almost entirely transparent.  So there is really no excuse for avoiding comprehensive transaction demarcation in every app you build.  The correct mindset should be that a non-transactional write to your persistent store is a bug, no questions asked.

Now that that's out of the way, I'll dive into some of the ORM-specific problems transactions present (and solve!) in my next post.