Netflix Prize Within 0.5%

I was looking at the leaderboard for the Netflix Prize while waiting for a huge DB transform to complete, and saw that as of last week, BellKor/BigChaos are within half a percentage point of the required 10% improvement to win.  Yay those guys!

Adventures in Monkey Code

I did a little experiment this weekend.  I needed (well, wanted) to build a really simple little photo viewer application.  Create a gallery, add some photos, view the photos as a slideshow.  Really basic.  The catch is that I build it using no framework at all, aside from Application.cfm.  Note the 'm'.  And no IDE: everything was done with text-mode Emacs and the MySQL command line client.

Each "page" of the app was actually a real page (a CFM).  Data access at the top, then a CFOUTPUT block below for the "view" pages, and just simple processing with a CFLOCATION for the action pages.  I built the forms so the same page would accept both create and edit requests, switching on the presence of an ID in the request parameters, so I got a little reuse that way, but nothing else.

For initial development, this worked swimmingly.  Each page was totally separate, pick a page, build it, move on to the next.   In a couple hours I was done, and most of that was spent on getting all the JS interactions tuned for inline editing, sorting, the slideshow, etc.  The app was simple enough that I didn't even do much copy and paste.  Maybe five or six blocks total, and two of those were form skeletons so I didn't have to start from scratch (i.e. not candidates for "real" reuse).

So then I wanted to add zip upload to streamline a bit.  Make a zip, upload it, and have individual items created for all the contained jpegs.  Sounds simple, right?  Except that all the logic was inside savePhoto.cfm.  I briefly considered setting up the attributes scope for each photo and then CFINCLUDEing that file for each jpeg in the zip, but it wouldn't have worked with the CFFILE ACTION="UPLOAD" in there, so that was out.  Time to refactor.

Still wanting to avoid any sort of framework-ish-ness, I opted for a UDF library instead of a CFC.  Mostly to resist the temptation to slap ColdSpring on it so I could stop thinking about those fu$@*ng CFTRANSACTION tags.  I split the logic from savePhoto.cfm into four UDFs, sliced appropriately for either "uploading" with CFFILE or sourcing from the local filesystem.  This actually worked pretty well, and I didn't have to duplicate any logic anywhere to support zip upload, create new photo, and replace existing photo.

So here I am at the end of my journey.  I actually had a pretty good time not thinking about the infrastructure and just diving right into the code.  No messing with getting ColdSpring and FB3Lite into the project, setting up mappings for them, etc.  That said, I do NOT recommend the shortcut to anyone unless they have a similar experimental interest.

As of right now, I'm probably slightly ahead in terms of time expended compared to doing the FB3Lite-on-ColdSpring way, but only slightly.  And I have manually managed transactions, no good way to apply security, layouts managed with Application.cfm and OnRequestEnd.cfm, no ability to reuse view code other than more UDFs, and a fair amount of duplicated SQL and error handling.  In short, a mess.

I'd imagine that the next enhancement made will push me over the brink to reduced productivity from the initial decision to monkey-code it, and it's downhill from there.  Fortunately the app is small enough that doing a wholesale conversion won't be too much work (and yes, that was one of the considerations for undertaking the experiment).   So that'll probably happen at the same time for the cost of an hour or so.

All in all, a good experiement, I'd say.  Completely confirmed my beliefs that spending half an hour setting up a good project framework is worth every second, and the "lost" time is made up within the first few hours of development.   And this was a very small project with a single developer, the rewards only get more significant as projects grow in size.

1 Items Found [sic]

If you're showing a textual label such as "N items found", please, pretty please, put the logic in there to hide the "s" when N is equal to one.  Same goes for changing "children" to "child", etc.  It's not hard, and it makes your software look retarded (or super enterprise-y) when you don't do it.  And no, i18n doesn't count as a reason to skip it: use ChoiceFormat (or the equivalent in your language) or get a better library.

Yay VPNs!

I've been having a hell of a time with the office VPN client this week.  It refuses to install on my laptop (a stock ThinkPad) for some reason.  The "Determinisit Network Enhancer" can't add a plugin.  Whatever that means.  Tried a few different version of the client, all to no avail.  This prompted my walk of a couple days ago, because the machine wouldn't accept SQL Server either.  Today I gave up, set up a CentoOS box, built the Linux version of the VPN client, and it worked like a champ.  Why this is so hard, I don't know, but when the "easy" solution involves installing GCC and compiling some vendor source it seems something is amiss.  Not that I mind compiling source, but it seems like a suboptimal solution.

My Empty Inbox

This evening I ran across yet another person trying to do the "keep your inbox empty" thing, and one who thinks the task is somewhere between difficult and impractical.

I have two inboxes: Gmail for personal, Outlook for work.  They're both always empty, and I don't find it difficult at all.  As email comes in I reply if needed, and then immediately archive.  If it needs following up later, I'll  add a notification to my calendar to make sure I revisit it, ideally with all relevant bits in the calendar event so I needn't go find the email again.  This seems to work very well, and isn't intrusive to my schedule.

I want to hear your stories of pain with non-empty inboxes.  What makes it difficult?  Is there something your email client could do better to help out?  Is it fundamentally a time management problem expressed in the form of an inbox, or is it something inherently email-specific?  Do you have the same problem with voicemail and/or text messages?

Weird MySQL Behaviour

Last night I added a new log field to PotD and since I did it live on my prod instance, I wrapped it with a bunch of error handling so that if anything went wrong it wouldn't affect users, it just wouldn't log the new data.  (No, this is not my standard operating procedure – PotD is special in this sense).  Because of that silent failure behaviour, I wanted to monitor that the data was recording, so I ran this query:

select isPrioritized, count(*) from log_email;

Now, you can probably see exactly what's wrong with it, but it was late, and I didn't.  What's weirder, MySQL didn't throw an error, it just returned this recordset:

+---------------+----------+
| isPrioritized | count(*) |
+---------------+----------+
|          NULL |    15346 |
+---------------+----------+

So I spent a while trying to figure out what was wrong with my code, and it sure seemed correct.  Eventually gave up and went to bed, figuring that a few more hours without logging the data wouldn't hurt that much.  Upon getting up this morning I quickly realized what the issue was. The right query looks like this:

select isPrioritized, count(*) from log_email group by isPrioritized;

which returns this:

+---------------+----------+
| isPrioritized | count(*) |
+---------------+----------+
|          NULL |    15331 |
|             0 |        3 |
|             1 |       12 |
+---------------+----------+

I'm pretty sure the original query should error because of the aggregate function without a GROUP BY clause, but who knows.  Something to watch out for.

WordPress 2.7, Take 2

I'm pleased to say that Ozh has updated his Admin Menu plugin with a fix for the issue I blogged about a couple days ago.  So as of a few minutes ago all the blogs I host have been upgraded to 2.7.  Yay.

Fast Music

As a musician, is it strictly necessary to blaze through every piece of music you play?  Listening to Christmas music on Pandora and probably 2/3rds of tunes, particularly non-orchestral pieces, are at about 125% the speed they should be played.  It doesn't make you a better player just because you can flail your fingers and/or lips at a ridiculous pace and remain on rhythm.  Perhaps more technically competent, but if that was the goal, we'd only listen to sequenced music and no one would ever play it for real.

Yay for Trans-Siberian Orchestra and Mannheim Steamroller – two notable exceptions.   For the rest of you:  slow down.  Creating music is a gift.  Enjoy it.  We will too.

WordPress 2.7

Just finished upgrading my blog to WordPress 2.7.  Unfortunately I found what appears to be a bug in the truly excellent Ozh Admin Menu plugin.  It doesn't seem to handle extra menu items being added to the main menu correctly; it only partially renders the menu bar when such an option is active.  Not sure exactly what to do about that, but it does prevent rolling out the upgrade to all my users.  Can't very well have them go to upload a gallery and have 90% of their menu disappear, so my blog will remain the only 2.7 blog for the moment.

Other than that, I'm quite pleased.  Categories are back on the sidebar when you're writing a new post, the design is a lot cleaner, and aside from that one issue with Ozh and another minor scoping issue in my Picasa plugin, everything was transparent to upgrade.

Tag Hierarchies

About four and a half years ago I wrote a little event tracking app that accepts a timestamp and a list of tags, and then provides a pile of ways to report on the data.  Think Twitter, except a couple years earlier, and designed for consumption by software, not people, at least at the individual event level.  The app's been working magically since then, and now that there are a few people using it I realized I needed to support tag hierarchies.

The specific use case in question is aggregate reporting.  Say you track what you eat, and then you want to get a report of how often you eat vegetables.  To this point you've had two choices: retag all the events with 'celery', 'corn', 'carrots', etc. with an extra 'vegetable' tag, or write your report to OR together all the different veggies (remembering to update it every time you eat a new one for the first time).

On the flip side, if you just have a hierarchy of tags, you can drop all those tags underneath the 'vegetable' tag and then report on 'vegetable' directly, which has the implicit meaning of "itself and all it's descendants".  This is obviously much more desirable.

However, overlying a hierarchy is not without problems.  Tags are inherently free-form, and this app is no exception.  As such, the unique key needs to be the tag name by itself, not the combination of the tag name and it's parent.  So the solution I adopted is to expose tags in a flat structure, excepting in the actual hierarchy editor (for which I used the fantastic ExtJS Tree Control), and a slight tweak to the querying language.

Previously, you searched for events using a "tag:celery" style query.  With no hierarchy, that matched exactly the celery tag.  With a hierarchy, the semantics have changed slightly to the celery tag or any of it's descedants.  If you want the old behaviour, you'd use "tag:=celery".

Behind the scenes, I'm using nested sets for storage which makes those descendant queries lightning fast, though I ran into some interesting issues because there is only one storage table for all tags across all users, and each user's tag tree is potentially multi-rooted.  Neither are inherently difficult to deal with, but both were new problems and required a bit of reworking to my treemanager component.

All in all, the process was really painless, and I'm quite pleased at how transparent the overlay of hierarchy ended up being to the general functioning of the system.  In particular, ExtJS was a dream to work with.  It's fast, easy to use, easy to develop with, paired quite nicely with jQuery (what drives the rest of the app), and ended up requiring less than 70 lines of code to create the control, do lazy loading of children, add new nodes to the tree, reorder and rename nodes, and do all the backend calls to update the DB as needed.