Ant For Server Configuration

I use Apache ant for a lot of things, almost none of which have anything
to do with building software. Simeon knows
much of what I do with it from the course of various discussions, and while
he was using it for something a week or two ago, he suggested that I blog
about some of my experience. Within 10 minutes. It didn't happen, of
course, but hopefully late is better than never.

ant, for those who don't know, is a Java-based build tool, somewhat of the
same vein as make. Major differences include that it's XML based, runs on
Java, and has built-in commands, rather than relying on the shell. Build tool
or not, where I use ant the most is in server configuration tasks. I'm going
to consider BIND config files for my example, as it's relatively straightfoward.
Apache config is my other major use; it makes it very easy to have a single
config template managed with version control, but still be able to build
actual configuration files for multiple servers in the cluster, all of which
are not exactly equal. But that's not what I'm talking about here.

DNS is pretty simple to deal with, but there is a LOT of repetition,
especially across a large number of domains that are all basically aliases
for a single application (which is what I've got). So rather than maintain
a couple hundred nearly identical zone files, I use ant to do all the dirty
work for me. Before I delve into the guts, here's a typical zone file:

$TTL 1h ; default TTL
@ IN SOA ns1.piersystem.com. root.piersystem.com. (
2005090901
3h
15m
30d
1h
)

; NS records
@ IN NS ns1.piersystem.com.
IN NS ns2.piersystem.com.

; Address records
@ IN A 216.57.200.38
www IN A 216.57.200.38

; other stuff
@ IN TXT "v=spf1 a mx ptr ip4:216.57.200.32/27 ip4:66.235.70.224/27 ~all"

I have about 90 copies of that zone, another 40 or so that are very close to
copies, and finally perhaps 10 that are pretty different. This is where ant
really shines, because it lets me templatize and parameterize the zone files
so that the entire set can be created from a very small amount of data.

How this works is via ant's wonderful filtering and property expansion
capabilities. Basically, when you copy a file from one place to another with
ant, you can also define filters to be performed as part of the copy. One of
those filters does property expansion, where properties are things like
${myPropName}, and defined in external properties files. Details to
come later. So those 90 cloned zone templates just contain a single line:

${basic.zone.pier}

That expands via this definition:

basic.zone.pier=${ttl} \n\
${soa} \n\
\n\
; NS records \n\
${ns} \n\
\n\
; Address records \n\
@ IN A ${ip.pier} \n\
www IN A ${ip.pier} \n\
\n\
; other stuff \n\
${spf1} \n\

As you can see, that definition includes even more properties, which continue
to expand until you arrive at the zone file I showed above. So all the data for
every single zone file is enclosed in two properties files (or for structure,
and another for IP addresses). But that's just the 90 or so clones.

The next batch of about 40 almost-clones are all additive changes. Most
require defining a subdomain or three, or some records for external infrastructure
that we don't manage. So those zone templates look like this (this for the
uscgstormwatch.com zone):

${basic.zone.pier}

dennis IN CNAME www
emily IN CNAME www
katrina IN CNAME www

Not much to see there, just the same thing with a couple extra records
defined afterwards. Now the last 10 or so totally custom zones. Here's an
example (the audiencecentral.com
template):

${ttl}
${soa}

${ns}

intranet IN NS ns1.piersystem.com.
IN NS ns2.piersystem.com.

; Address records
@ IN A ${ip.audiencecentral}
www IN A ${ip.audiencecentral}
testdrive IN A ${ip.audiencecentral}
shrike IN A ${ip.shrike}

; PIER sites
news IN A ${ip.pier}
sales IN A ${ip.pier}

; Other stuff
office IN A ${ip.office}

${mx.plands}

; SPF record
${spf1}

If you look back at the expansion of the basic.zone.pier, you'll
see a lot of similarities. Almost all of the pieces are reused, and there are a
few new pieces mixed in as wel. There are also some new IP addresses.

That's enough examples, lets get to the meat and potatoes of this whole thing,
the ant build file: build.xml. Here's the guts of it:

<target name="generate" depends="getSerial">
<property file="ip.properties" />
<property file="common.properties" />

<copy todir="${build.dir}" overwrite="true">
<fileset dir="${src.dir}">
<include name="**/*.tmpl" />
</fileset>
<mapper type="glob" from="*.tmpl" to="*.dns" />
<filterchain>
<expandproperties/>
</filterchain>
</copy>

<copy todir="${dest.dir}" overwrite="true">
<fileset dir="${static.dir}">
<include name="*" />
</fileset>
</copy>

<move todir="${dest.dir}" overwrite="true">
<fileset dir="${build.dir}">
<include name="**/*.dns" />
</fileset>
<mapper type="flatten" />
</move>
</target>

This defines a target (ant's name for a piece of work) named "generate", and
that it depends on the target "getSerial". GetSerial, as you might imagine,
creates the serial number for the zone files and stores it in a property so that
it can be injected as part of the ${soa} expansion. Anyone who's interested in
how that works, let me know; I'm going to skip it here because it's complex,
nasty, and doesn't really lend anything to this post.

First thing the target does is include a couple property files (which I've
mentioned before), that contain all the expansions, including the
basic.zone.pier one. Next it does a couple copy operations and
then finishes up with a move operation.

The first copy tag copies "something"
to my build dir (specified by the ${build.dir} property, which happens to point
at ./build. The fileset tag it contains specifies what that
something is: all files in the ${src.dir} directory (including subdirectories)
that end with .tmpl. Next, the mapper tag converts all file extensions from
.tmpl to .dns as part of the copy (since that's my extension of choice
for zone files). Finally, the innocent looking filterchain and expandproperties
tags, do all the magic of expanding all those properties in the files that are
copied.

The second copy is much simpler, doing nothing more than a vanilla copy
of the files in the ${static.dir} to ${dest.dir} (which points to
/var/named). Note that this is different than where the first
copy went for reasons we shall see in a moment.

The last piece of magic happens in the move tag. It moves the newly created
.dns files from the build directory into the destination directory where the
static files just went, and it applies another mapper that flattens the directory
structure. ant only allows a single mapper per copy/move operation,
which is why I copy to a temp location, and then move to the real place. As you
can probably guess, doing a flatten allows me to keep all my zone templates
organized into a neat hierarchy for easy management, but not have to deal with
the pathing issues when it comes time to actually give BIND the zone files.

Impetus for an OO Backend

After a long, multi-faceted discussions on CFCDev a few weeks ago,
one of the participants contact me off-list wondering about a sample
app that illustrated some of the concepts I'd mentioned in the
discussion. I don't have one, and while I could make one up, the
implementation is the easy part. It's the reasoning behind doing
things a certain way that is what's important. So what I'm going
to endeavor upon is a quick overview of how I've come to building apps
the way I have. It will undoubtedly be long winded, but I'll try
to be as brief as I can. ;) Hopefully at the end you'll see
three things:
First, OO is hard to do 'right', since 'right' is both subjective and
ever-changing, second, that the 'right' way is always defined by need
and not by how well you use Design Pattern X, and third, that I don't
have any magical OO power, I've just spent the past three years
tuning a single app and building a large body of experience.

Application LayersTo
put the cart before the horse, here's an image that roughly lays out
what I'm going for. At the top we have the Presentation and UI
Controller layers. The first is just HTML, the second is a
framework of your choice (I use FB3 primarily). The dark blue
section, however, is what I'm going to be talking about. And note
that this diagram is about how the tiers work, not necessarily how the
implementation is structured.

First some back-story. I took over an app that was utter spaghetti
code. About 80,000 lines of it. I spent a month converting
everything to Fusebox 3 (the current version at the time), and that
dropped to about 50,000 lines simply from code reuse. (ed.
note: That's a 37% decrease in code size; use a framework people) There was no MVC in the app, it was all single
fuseactions, so the only reuse happened at the fuse level. This
is all on CF4.5, and when CFMX 6.1 came out (and alleviated most of the
bugs and inadequacies in CFMX 6.0's CFC implementation), we upgraded,
and the CFC-ization began in earnest.

My first objective was to stop having
to write so many damn queries, so automated entity persistence was top
of the pile. Fortunately, that's really easy to do, since entity
persistence operations are very closely allied to your database
schema. A few hours of working and I'd built a generator that
would read a table schema from the DB and generate a skeleton BO and
fully implemented DAO for performing persistence operations for
it. It'd also generate a boilerplate factory/manager for the
entity type (getNewUser, getUserById, createUser, deleteUser,
updateUser). With that, I no longer had to have any single-entity
queries in my fuses, rather I request a BO from the appropriate manager
(all of which were singletons in the application scope), do what I
needed, and then call createXXX or updateXXX on the manager with the
modified BO. That saved me enormous amounts of time, particularly
with changes to entity fields (like adding a country to user info),
because I could just regenerate the DAO, and all the persistence
operations were magically updated.

That saved a lot of work, but
it didn't help abstract the business logic out of the UI. My
fbx_Switch.cfm files were still littered with a mix of business logic
and UI processing. So the next step was to start creating service
objects to put the business logic in. This became doubly
important since about this time we started exposing certain
functionality over web services as well as the HTML UI, and that was
starting to lead to enough duplicate logic code to trigger warning
bells. So much of the business logic moved into the service
objects.

An important point to make here is that while managers, BOs, and DAOs are
all pretty much one-to-one-to-one, services are not. An example
would be a permission-based security model, where you have users,
groups, and permissions. That's three distinct manager/BO/DAO
sets, but you probably only need a single SecurityService that deal
with all of them. And you'll probably have a LoggingService that
doesn't care about security, but does care about some aspects of users.

As soon as service objects popped into existence, however, up reared
a major issue with encapsulation. A good issue, mind you, but one that
required creating a solution for. Simply put, the services needed to be
able to talk to each other, but couldn't do that without going to the
application scope to get a reference. The same thing applied to manager
instance as well. The "right" solution, like so many other things, was
driven by time constraints not good design (though it was good enough).
At this time, all managers (and now services) were instantiated into the
application scope directly (i.e. application.securityservice). That was
quickly revised to instantiate them into a managers and services struct
respectively, copy the keys into the application scope, and the pass the
struct into each service (via a setManagers and a setServices method in
the AbstractService superclass). None of the existing app needed to change,
but all the services had references to everything else, and it took all
of ten minutes to put together.

The right solution (from a design perspective) is to use a factory,
and then just pass that factory around. The struct solution is similar
in concept, though not nearly as encapsulated. Needless to say, the
lesson was learned, and factories are now properly used from the start,
rather than being an afterthought.

Ok, time for a breather/cigarette/shot of whiskey. Whew! And here
we go again…

The next problem was that the services often needed to do arbitrary queries
as part of their business logic. This is a two-pronged issue:
SELECT queries, and other queries. And with the SELECT queries,
there are queries that the UI will also need (which still resided in
qry_ files at this point), and others that are strictly needed by the
backend. Along came gateways to solve the first part of the problem.

The set of gateways falls somewhere between the set of services and the
set of BOs in makeup. For example, a UserGateway for users and a
SecurityGateway for groups and permissions. Gateways were also the first
objects to be created via an application-scope factory and use lazy loading
for faster app startup. The methods of the gateways included all the SELECT
statements needed by both the UI and the services, though that may change at
some point in the future.

The problem of the non-SELECT queries was solved by deciding to let the
services perform those queries directly against the DB. It's worth mentioning
that this was only for queries that weren't tied to a single entity; those are
always performed through the entity. Initially, I thought this was a rather
poor way of handling it, but upon further reflection, I've become pretty
comfortable with it. The other solution would be dedicated DB modifier object
that the service delegates to. I'm not sure it's worth the complexity,
unless having all your SQL in SQL-specific objects is important for your app,
because you lose the ability to modify the business logic all in one place.

So now we've got business logic in the services, but we have BOs
(Business Objects, mind you) that are little more than DTOs
(Data Transfer Objects) for easing persistence operations. So entity-specific
business logic was moved into the entities from the services. An example
would be a 'post document' operation. Previously, the DocumentService
would pull the Document BO, set 'isPosted' to true, 'postDate' to now(), etc.,
persist it, and then go on with the other tasks (like distributing
notifications, or clearing the cached list of "recent updates" for the
site). With the new setup, the DocumentService simply calls postDocument()
on the Document BO, which takes care of all the document-related stuff in
one magic step. The service is still in charge of the other non-entity
operations, but all the entity operations are now part of the entity.

So what do we have at the end of it all? A rather complex arrangement of
CFCs (all told, about 200) that makes working with the application much easier.

What other things would I like to see? A real centralized application object
that contains the entire application, and can be passed around instead of the
structs of services and managers for one. This is already partially implemented,
but it's not complete. I'd also like to see user security and logging be
integrated in a more transparent way. Right now, both must be explicitly coded
for, which has led to errors in the past. I'd much rather have that applied
magically by some framework (probably dynamic wrapping of the service objects)
so that it's guaranteed to be consistent across the board.

What about problems? Funny you should ask. ;) Probably the biggest problem
is dealing with DB transactions. CF only exposes the CFTRANSACTION tag, which like
other tags, acts upon it's body. More to the point, there isn't a way to say "start
a transaction, if one isn't already active." Certain business operations
are both standalone operations and part of larger operations, and can be invoked
either way. The solution we've employed is to have both transactional and
non-transactional versions of those methods (the transactional method being nothing
more than a CFTRANSACTION tag wrapping a call to the non-transactional version).
It works, but it's hardly elegant, particularly with methods that take a lot of
arguments.

One solution that I tried out on another app I threw together was to manage
transactions via TransactionManager object that didn't use CFTRANSACTION at all,
but rather use CFQUERY to talk to the database directly. It worked fairly well,
but it depends on an implementation detail of CFMX. Namely that a given request
gets a single DB connection for the duration of the request, and that it is the
ONLY request that has access to the connection. Just for reference BD doesn't
have this feature. It also gets complicated because you must
ensure that your transaction either gets committed or rolled back before the request
finishes, or you'll have some weird issues. However, it does allow you to have
the "start a transaction, if one isn't already active" functionality, which makes
things ENORMOUSLY easier. I'd really love to see this behaviour appear in future
versions of CF, but who knows.

Another problem is that the sheer weight of a complex system can make otherwise
simple tasks fairly daunting. In that case, as long as encapsulation isn't broken,
we simplify. There are a few subsystems that are comprised only of a service object
that contains everything. To come clean, the 'permission' entity from my security
examples above is one. There isn't a permission manager, BO, or DAO, just the methods
in the SecurityService. These are also a major source of the non-SELECT queries that
the service objects need to perform. But like everything else, it's all about picking
the right compromises in each situation. If we need to have a full OO implementation
of permissions, injecting it will be very simple, since the service methods won't
change at all, they'll just stop doing everything themselves and start delegating to
the new backing objects. And that's the real power of encapsulation.

Just for reference, that flat, service-only architecture is usually where I start
with everything because it's quick to develop, and easily facilitates growth down the
road. One good (and publicly known) example is Ray's BlogCFC. It's all in there as
one massive file and that's exactly as it should be, particularly since installation
simplicity is important.

So, after much typing, I'm calling it quits. Hopefully I've met the three
objectives I laid out at the beginning, and haven't caused anyone to shoot smoke out
their ears or pass out on their keyboard. At some point, I may actually show some
code, but I'd like to at least pretend that anyone who gets example code will have
understood the reasons behind why it is the way it is.

cvs2svn Rules!

I've been moving all my stuff from CVS to Subversion over the past
few months.  Some of the stuff (like server config files) I've
just been moving the top revision across since the history is of little
concern.  The larger projects, however, I've migrated the whole
history with the fantastic cvs2svn
tool.  It somehow reads your CVS directory, figures out what each
changeset should be (presumably by finding nearly identical timestamps
with the same commit message), and then imports it all into Subversion
just as if you'd made the same commits as you did to CVS. 
Needless to say, this is fantastic, since it basically lets you switch
from CVS to Subversion with almost no cost.  But how well does it
work?

Tonight, after a couple months of wussing out, I finally
sacked it up and moved my company's main app across and it went
flawlessly.  It's not a huge app by any stretch (2500 files or
so), but it had several years of history with a lot of branches and
tags in there.  Much to my relieve, it all came across perfectly.

I
ran into two minor problems with the export, undoubtedly due to newbie
screwups I made, since both were from very early in the history. 
First, I had a tag that was both a tag and a branch; easily solved with
the –force-branch="branch_name" option to cvs2svn.  Second, I had
an invalid keyword expansion command, and that was solved by using the
–use-cvs option so it used CVS rather than RCS's checkout command.

After
those tweaks, and about two hours of spinning, I had a shiny new
Subversion repository.  Disconnected the projects in Eclipse,
moved my CVS working directories out of the way, checked the same
branches out from Subversion, copied my .project file (for Eclipse)
over, and reconnected the projects in Eclipse (to Subversion this
time), and off I went.

If you're considering moving from CVS to
Subversion, but haven't because you don't want to lose all that
history, definitely check out this tool.

How Nested Sets Work

I got an email question today about how nested sets work, after the
developer started using my TreeManager component.  I figured that
was a good topic for a blog post, so here it is.

Nested sets operate based on two fields, rpos and lpos (right
position and left position).  They're calculated by doing a depth
first traversal of the tree; that is, numbering each node's left and
right sides as you get there.  So here's a sample tree:

food
/ \
meat fruit
| / \
beef apple pear

An depth first traversal will basically start to the left of 'food',
and run around the entire tree until it gets to the right side of
'food', and number each node it comes to.  Every node will get two
numbers, one for each side.  The numbering will look like this:

1.food.12
/ \
2.meat.5 6.fruit.11
| / \
3.beef.4 7.apple.8 9.pear.10

And in tabular form:

food 1,12
meat 2,5
beef 3,4
fruit 6,11
apple 7,8
pear 9,10

A couple things to notice.  First, the root node always has
number L=1 and R=n*2, where n is the number of nodes.  Leaf nodes
(those with no children) always R=L+1, and non-leaf nodes always have R
- L % 2 == 1 (and even number of interim numbers).  The real
magic, of course, is that any given node's subtree is entirely between the numbers
of the node, which is why they're called nested sets.

Nested sets
have a number of advantages over adjacency lists (using a parentID),
though they have some disadvantages as well.  Advantages include
the blazing speed of pulling out hierarchies,intrinsic ordering of
sibling nodes (meat and fruit or apple and pear in my examples), and
the fact that it's impossible to orphan a node by deleting a node in
the middle of the tree, just to name a few.  Disadvantages include
complexity (though TreeManager alleviates a lot of that), and expensive
structure changes.  However, the tradeoff is usually in favor of
nested sets over adjacency lists, since recall is almost always more
important than updating.

Neuromancer Extension

I've used Neuromancer
(from Rob Rohan)
on several projects, and I finally decided to implement form posts, in
addition to the raw posts that it already supports.  Like it
should be with a well designed library, adding the new functionality
was a snap.  I forwarded the changes to Rob, and I'd expect them
to be in the next Neuromancer release, which will be coming out
approximately whenever he feels like it.  He just got married
though, so he's got more important things to do than deal with code, so
don't hold your breath.  If anyone is interested in
the mods in the meantime, I'd be happy to pass them along directly.

Weird MySQL INSERT Quirk

I wrote an INSERT query today (for the first time in months), and
missed a comma between two of my values.  I also missed one of the
columns in the column list, so there while there was an extra value,
two of them weren't separated by a comma.  Something like this (though with CFQUERYPARAM, of course):

INSERT INTO mytable
    (id, name)
VALUES
    (#id#, #name# #email#)

Now where it gets weird is that MySQL happily ran the statement without error, and concatenated the two values together and inserted the result in the appropriate column (all were varchar).  Weird.

Not
sure on whom the blame should reside (possibly me?), but wanted to give
a heads up.  And to say that generated SQL is better than hand
writing it.  ;)

Welcome Emery Isaac!

On Friday morning, my wife gave birth to our second child, Emery Isaac Boisvert
8 pounds, 1 ounce, 20.5 inches long, and with thick black hair. 
Mom and baby are doing well, as came home from the hospital this
afternoon.

Connector/J 3 Gotcha

As is pretty common knowledge, CFMX 6, CFMX 7, and BD all ship with
JDBC drivers for MySQL 3.x.  There have been several posts (here's
one from Steven Erat)
about installing MySQL Connector/J 3.x, which is the driver for MySQL
4.x.  In particular, it supports the new authentication scheme
that was introduced in the 4 series.

I've done this on several
servers with great success, but I ran into a little problem a couple
days ago, and thought I'd share.  MySQL stores 0000-00-00 00:00:00
as the default value for NOT NULL DATETIME columns.  With the
default MySQL drivers, that gets converted to November 30th 0002, but
with the updated drivers, it throws a date format exception.  I
can't say for sure if it happens all the time, but at least sometimes.

With
a properly designed app and schema, this shouldn't be a problem (since
those values should be NULL, not the "zero" date), but if you've got
garbage lurking, it's definitely worth a careful look if you're
considering the updated drivers.

Movable Type Spam Killer

Last week sometime, fed up with the unending comment spam (thousands per
week), I had an epiphany.  Spammers aren't slowed down by changing
the action of the comment submission form, so it's obvious that they're
parsing that out of the markup for the form.  By simply removing
that from the form, theoretically the comment spam should stop.  Could it really be that simple?

Sure
enough, it certainly seems to have worked.  On submission of the
form, the action is populated with JavaScript (view an entry and check
the source), so the form works exactly as before, but the actual
destination URL isn't available in the source without a pretty clever
parser (or a human).

Since I made the change, i haven't gotten a
single spam comment.  Zip.  Zero.  Nadda.  And I'm
not talking zero that have gotten through MT-BlackList; I'm talking
zero.

I know that few who read this are using MovableType, but if
anyone is and is having trouble with spam, I highly recommend this
technique.  Just make sure you also rename mt-comments.cgi, update your mt.cfg file, and rebuild, so you're not using the default template (which spammers don't have to parse out to find).

CFMX At Last

I finally got my copy of CF7 Standard a couple days ago, and have
just started the process of rebuilding with CF rather than JSP. 
There's a lot of stuff to rebuild, not all of it trivial, but I'm
hoping to be mostly done this week.

Hardly even worth posting, I
know, but it'd been over two weeks since my last post, and I'm pretty
fired up to finally have a license of CF after trying to get my hands
on one for many many months.