Build-Time Aggregation of JS/CSS Assets

Ben Nadel posted about compiling multiple linked files (JS/CSS) into a single file this morning, and he does it at runtime. I commented about doing it at build-time instead, and a couple people were wondering more, so here's a brief explaination.

The first part is a properties file (which can be read by both Ant and CF (or whatever)). Here's an example (named agg.js.properties):

# the type of file being aggregated (used to do minification)
type         = js
# the URL path the files are relative to.
urlBasePath  = /marketing/js/
# the list of filenames to aggregate.  The first line (with the equals
# sign) should be a filename and a slash, all other lines should be a
# comma, a filename, and a slash  Indentation is irrelevant.
filenames    = date.js\
  ,jquery-latest.js\
  ,ui.datepicker.js\
  ,ui.mouse.js\
  ,ui.slider.js\
  ,ui.draggable.js\
  ,jquery.dimensions.js\
  ,jquery.easing.1.2.js\
  ,jquery-easing-compatibility.1.2.js\
  ,coda-slider.1.1.1.js\
  ,jquery.tooltip.min.js\
  ,jScrollPane.min.js\
  ,jquery.metadata.js\
  ,prototype.classes.js\
  ,reporting.js\
  ,jquery.ajaxQueue-min.js\
  ,script.js

This sets up the everything needed for the aggregation. Within our project, we have this file as a peer of the property file (named agg.js.cfm):

<cfscript>
filename = replace(getCurrentTemplatePath(), ".cfm", ".properties");
fis = createObject("java", "java.io.FileInputStream").init(filename);
bis = createObject("java", "java.io.BufferedInputStream").init(fis);
props = createObject("java", "java.util.Properties").init();
props.load(bis);
urlBasePath = props.getProperty("urlBasePath");
type = props.getProperty("type");
filenames = listToArray(props.getProperty('filenames'));
for (i = 1; i LTE arrayLen(filenames); i = i + 1) {
	if (type EQ "css") {
		writeOutput('<link rel="stylesheet" href="#urlBasePath##filenames[i]#" type="text/css" />');
	} else { // js
		writeOutput('<script src="#urlBasePath##filenames[i]#" type="text/javascript"></script>');
	}
	writeOutput(chr(10));
}
</cfscript>

It reads the properties file, and writes out either LINK or SCRIPT tags as appropriate to the individual assets. This facilitates easy debugging in development, because nothing is modified from it's source. The file is included into the HEAD of our layout templates to get everything in page.

The real magic happens with Ant, which we use for our deployments. Within the build file, we have a call to the aggregateAssets target for each properties file:

<antcall target="aggregateAssets">
  <param name="propfile" value="${output}/wwwroot/marketing/templates/agg.js.properties" />
  <param name="rootdir" value="${output}/wwwroot/marketing/js" />
</antcall>

The params specify the properties file and the root directory. Note that the rootdir param corresponds with the urlBasePath in the properties file. The target itself looks like this:

<target name="aggregateAssets">
  <!-- read the aggregation properties -->
  <property file="${propfile}" prefix="agg" />

  <!-- get the root -->
  <propertyregex property="agg.root"
    input="${propfile}"
    regexp="^(.*)\.properties$"
    select="\1" />

  <!-- split the root into file and path sections -->
  <propertyregex property="agg.fileroot"
    input="${agg.root}"
    regexp="^.*/([^/]+)$"
    select="\1" />
  <propertyregex property="agg.pathroot"
    input="${agg.root}"
    regexp="^(.*/)[^/]+$"
    select="\1" />

  <!-- set up the output file stuff -->
  <property name="agg.outfile" value="${rootdir}/${agg.fileroot}" />
  <property name="agg.cfmfile" value="${agg.root}.cfm" />
  <property name="minsuffix" value=".yuimin" />

  <!-- run everything through the YUI Compressor -->
  <for list="${agg.filenames}" param="filename">
    <sequential>
      <echo message="compressing @{filename} to @{filename}${minsuffix} (in ${rootdir})" />
      <java classname="com.yahoo.platform.yui.compressor.YUICompressor"
        failonerror="true"
        output="${rootdir}/@{filename}${minsuffix}"
        append="true"
        logError="true"
        fork="true">
        <arg value="--type"/>
        <arg value="${agg.type}"/>
        <arg value="--nomunge"/>
        <arg file="${rootdir}/@{filename}" />
        <classpath>
          <pathelement path="${java.class.path}"/>
        </classpath>
      </java>
    </sequential>
  </for>

  <!-- aggregate all the compressed files together -->
  <echo file="${agg.outfile}" message="// built by Ant using YUI Compressor" />
  <for list="${agg.filenames}" param="filename">
    <sequential>
      <concat destfile="${agg.outfile}" append="true">
        <header trimleading="true">
          // @{filename}
        </header>
        <filelist dir="${rootdir}" files="@{filename}${minsuffix}" />
      </concat>
    </sequential>
  </for>

  <!-- delete all the compressed files -->
  <delete>
    <fileset dir="${rootdir}" includes="*${minsuffix}" />
  </delete>

  <!-- write the CFM file to pull in the compressed and aggregated file -->
  <if>
    <equals arg1="${agg.type}" arg2="css" />
    <then>
      <echo file="${agg.cfmfile}"><![CDATA[<link rel="stylesheet" href="${agg.urlBasePath}${agg.fileroot}" type="text/css" />]]></echo>
    </then>
    <else>
      <echo file="${agg.cfmfile}"><![CDATA[<script src="${agg.urlBasePath}${agg.fileroot}" type="text/javascript"></script>]]></echo>
    </else>
  </if>
</target>

First, it reads the properties file, runs each listed asset through the YUI Compressor, and then aggregates the result. Finally, it overwrites agg.js.cfm (from above) with one that contains a single LINK/SCRIPT element to the aggregation result. End result is a single aggregated, compressed asset in production for speed, and separate uncompressed assets in development for easy debugging.

Edit: Do note that you'll need both the ant-contrib package and the YUI Compressor JARs to be installed into Ant for this to work.

S3 is Sweet (One App Down)

This weekend I ported my big filesystem-based app to S3, and it went like a dream. It's a image-management application, with all the actual images stored on disk. In addition to the standard import/edit/delete, the app provides automatic on-the-fly thumbnail generation, along with primitive editing capabilities (crop, resize, rotate, etc.). With images on local disk, that's all really easy: read them in, do whatever, write them back out. I figured using S3 would make things both more cumbersome and less performant. Both suspicions turned out to be unwarranted.

Building on the 's3Url' UDF that I published last week, I whipped up a little CFC to manage file storage on S3 with a very simple API. It has s3Url, putFileOnS3, getFileFromS3, s3FileExists, and deleteS3File methods, which all do about what you'd expect. You can grab the code here: amazons3.cfc.txt (make sure you remove the ".txt" extension) or visit the project page. It uses the simple HTTP-based interface, so after the authentication is handled, it's all very simple and fast. I haven't looked at the SOAP interface – why bother complicating a simple task?

With that CFC (and an application-specific wrapper to take care of some path-related transforms), porting the whole app took about two hours. I also realized after I was mostly done that the CF image tools accept URLs as well as files, so I switched my image reads to just use URLs instead of pulling the file local and reading it from disk.

As for moving all the actual content, S3Sync was a champ, moving about 4.5GB of data from my Cari server to S3 in a few hours, including gracefully handling a couple errors raised by S3 (which a retry – performed automatically – solved), and a stop/restart in the middle. Total cost: about 65 cents.

Next is porting the blogs, including all the Picasa-based galleries. Unfortunately, that means writing PHP, but with how easy the CF stuff was, I don't think it'll be too much effort.

My Amazon Toolkit (Thus Far)

I'm early in the move to Amazon, of course, but already some specific tools are indispensable.  I'm sure the list will grow, but here's where I'm at right now:

  • S3Sync – A simple rsync-like command line tool (called 's3sync') for syncing stuff from a computer to S3 or the reverse.  Also includes the 's3cmd' tool that roughly implements the web service API (list your buckets, put a file, etc.).  This is the cornerstone of the plan for moving all my data files from my current server and backups to S3.  Once the migration is complete, s3cmd will probably be the tool of choice for manipulating S3 programatically.  Written in Ruby, and requires 1.8.4+; my CentOS 4 box couldn't find a new enough RPM, so I had to compile from source (which was totally painless).
  • S3 Firefox Organizer (S3Fox)- a client for S3 following the standard FTP client paradigms.  It has it's own proprietary definition of folders, but they're unobstrusive.  Since I'm getting stuff into S3 mostly with s3sync, I'm mostly using this for read-only oversight.
  • EC2 UI – a client for managing your EC2 "stuff" from Firefox.  While not FTP-like at all, it shares a lot of the same UI as S3Fox for setting up accounts and the like.

Dummy Queries in ColdFusion 8.0.1

Brian Rinaldi posted on his blog about dummy queries in CF 8.0.1, and it struck me as a weird solution. So here's a drop-in replacement, that I think works in a more reasonable fashion, and doesn't have any dependency on an existing DSN.

<cffunction name="dummyQuery2" access="public" output="false" returntype="query">
  <cfargument name="queryData" type="struct" required="true" />
  <cfset var i = 0 />
  <cfset var columnName = "" />
  <cfset var myQuery = queryNew(structKeyList(queryData)) />
  <cfset var queryLength = arrayLen(arguments.queryData[listFirst(structKeyList(arguments.queryData))]) />
  <cfloop from="1" to="#queryLength#" index="i">
    <cfset queryAddRow(myQuery) />
    <cfloop collection="#arguments.queryData#" item="columnName">
      <cfset querySetCell(myQuery, columnName, queryData[columnName][i]) />
    </cfloop>
  </cfloop>
  <cfquery dbtype="query" name="myQuery">
    select *
    from [myQuery]
  </cfquery>
  <cfreturn myQuery />
</cffunction>

As you can see, the structure is almost identical, but it doesn't use a database, it just builds in memory. The "no-op" QofQ at the end is to ensure there is actual query metadata, not just the raw records, which Brian listed as one of his prerequisites. If you don't care, it can be removed with no ill effects.

One interesting benefit of this approach is that the rows come out in the same order as they go in – with Brian's DB-based one, that's not guaranteed because there is no ORDER BY clause on the query. Running his example on my box (using MSSQL 2005), I got rows sorted by first name. With the in-memory building, the rows are explicitly kept in order throughout.

Amazon S3 URL Builder for ColdFusion

First task for my Amazon move is getting data assets (non-code-managed files) over to S3. I have a variety of types of data assets that need to move and have references updated, most of which require authentication. To make that easier, I wrote a little UDF to take care of building urls with authentication credentials in there.

<cffunction name="s3Url" output="false" returntype="string">
  <cfargument name="awsKey" type="string" required="true" />
  <cfargument name="awsSecret" type="string" required="true" />
  <cfargument name="bucket" type="string" required="true" />
  <cfargument name="objectKey" type="string" required="true" />
  <cfargument name="requestType" type="string" default="vhost"
    hint="Must be one of 'regular', 'ssl', 'vhost', or 'cname'.  'Vhost' and 'cname' are only valid if your bucket name conforms to the S3 virtual host conventions, and cname requires a CNAME record configured in your DNS." />
  <cfargument name="timeout" type="numeric" default="900"
    hint="The number of seconds the URL is good for.  Defaults to 900 (15 minutes)." />
  <cfscript>
    var expires = "";
    var stringToSign = "";
    var algo = "HmacSHA1";
    var signingKey = "";
    var mac = "";
    var signature = "";
    var destUrl = "";

    expires = int(getTickCount() / 1000) + timeout;
    stringToSign = "GET" & chr(10)
      & chr(10)
      & chr(10)
      & expires & chr(10)
      & "/#bucket#/#objectKey#";
    signingKey = createObject("java", "javax.crypto.spec.SecretKeySpec").init(awsSecret.getBytes(), algo);
    mac = createObject("java", "javax.crypto.Mac").getInstance(algo);
    mac.init(signingKey);
    signature = toBase64(mac.doFinal(stringToSign.getBytes()));
    if (requestType EQ "ssl" OR requestType EQ "regular") {
      destUrl = "http" & iif(requestType EQ "ssl", de("s"), de("")) & "://s3.amazonaws.com/#bucket#/#objectKey#?AWSAccessKeyId=#awsKey#&Signature=#urlEncodedFormat(signature)#&Expires=#expires#";
    } else if (requestType EQ "cname") {
      destUrl = "http://#bucket#/#objectKey#?AWSAccessKeyId=#awsKey#&Signature=#urlEncodedFormat(signature)#&Expires=#expires#";
    } else { // vhost
      destUrl = "http://#bucket#.s3.amazonaws.com/#objectKey#?AWSAccessKeyId=#awsKey#&Signature=#urlEncodedFormat(signature)#&Expires=#expires#";
    }

    return destUrl;
  </cfscript>
</cffunction>

To use it, do something like this:

s3Url(aws_key, aws_secret, "s3.barneyb.com", "test.txt", 'cname');

That will generate a request to the file "test.txt" in the "s3.barneyb.com" bucket, using a CNAME-style URL. Obviously you'll have to know my AWS key and secret for it to work, and I'm not telling, but substitute your own values. You can use regular (bucket name in the request), vhost (bucket name in an S3 subdomain), cname (a vanity CNAME pointing at S3), or ssl (regular over HTTPS) for the 5th type parameter to control the style of URL generated.

Edit: here's a link to the project page.

Moving to the Amazon

I'm in the process of switching my hosting from a dedicated box at cari.net over to Amazon EC2 and S3. Based on my estimates, the costs will be slightly higher per month ($60/mo right now, $75-80/mo post move), but the benefits are significant:

  • Using S3 for all my backups and data storage will definitely give me some piece of mind that I've been lacking.
  • The virtualized nature of the servers means doing upgrades is totally safe: launch a new copy of the box, do the upgrade, and if everything's golden, switch the IP to the new box. Cost is $0.10/hr which is close enough to zero to not matter.
  • I get a processor "upgrade" from my Celeron at Cari to a similarly clocked Xeon equivalent. weight loss The latter is paravirtualized, of course, but it should still help since most of my apps are CPU-bound. I also get some more RAM, but that's less important.
  • Last, but not least, Cari has had a lot of network issues in the year I've hosted there while Amazon hasn't.

First task is to move storage over to S3, and update the applications that currently access stuff off the filesystem (like autogeneration of thumbnails).

New Cyclic Data Structures Utility

Back in October I posed a fledgling cycle-safe CFDUMP replacement.  Today, I had need for that same anti-cycle processing in serializeJson, so I abstracted the processing out into a CFC that handles both breaking cycles (for serialization) as well as restoring them (for deserialization).  By running a cyclic data structure through the breakCycles method, you can use CFDUMP, serializeJson, or whatever other context-free recursive algorithm you want on it without fear of infinite looping.  If you later turn that data structure back into an in-memory structure, you can use the restoreCycles method to recreate the cyclic references that breakCycles removed.

You can download the CFC (as a text file) here: cyclicutils.cfc.txt.  If you have the example from the cycle-safe CFDUMP somewhere, save the CFC in the same directory and tack this code on to the end of the test case:

<cfset cu = createObject("component", "cyclicutils") />
<cfset b = cu.breakCycles(b) />
<cfdump var="#b#" label="b" />
<cfset b = cu.restoreCycles(b) />
<u:dump var="#b#" />

You'll see the cyclic structure dumped with the cycle-safe CFDUMP as before, then the cycles are broken and it's dumped with the standard CFDUMP, and then the cycles are restored and it's dumped with the cycle-safe CFDUMP again.

Otherwise, just create yourself a cyclic structure and pass it to breakCycles.

CAPTCHA is the Devil

I know, I know. I've said this before. Trying to have a conversation on another blog and the CAPTCHA … oh my god. If you insist on using CAPTCHA (for some unknown reason), follow some rules:

  1. only letters and numbers, and never l, o, I, O, 1, or 0
  2. don't make the comparison case sensitive
  3. make sure all the characters are fully in the view window
  4. don't use session scope (use a hash)
  5. don't make the same person enter more than one CAPTCHA if they're otherwise identifiable

If you're rational, install anti-spam software (Akismet works great), throw some JS in there for robots to fail to deal with, and use per-contributor moderation. Please, lets not make the blogosphere into a hostile world of mistrust when it doesn't have to be. New wooden hot tubs, baths designs, hand made BBQ grills, along with lots of love: Wooden SPA Solutions UK

Read-Only and Read-Write SVN Repositories

Just got a comment on one of my posts from a while back about public SVN access wondering how to get it configured.  The basic idea is to have a single repository with anonymous read-only access, and have the same repository allow read-write access to authenticated users.  Further, you want to configure that on a per-directory basis (with inheritance, of course), so you can have different areas require different principals, and allow some sections to require authentication even for read access.

So without further ado, here's the magic configuration bits.

<Location /svn/barneyb>
    DAV             svn
    SVNPath         /path/to/svnroot/barneyb

    AuthType        Basic
    AuthName        "Subversion/Trac"
    AuthUserFile    /path/to/apache/conf/htpasswd

    AuthzSVNAccessFile  /path/to/apache/conf/authz.conf

    Satisfy     any
    Require     valid-user
</Location>

In this case I'm just using Basic auth with an htpasswd file for authentication.  The magic line is the "AuthzSVNAccessFile" line, which defines the file to use for authorization.  Here's a snippet:

[/]
barneyb = rw

[/bicycle_dashboard]
* = r
barneyb = rw

The first section says that for the root of the repository (/), only barneyb (me) is allowed access, and I'm allowed to read and write.  The second section says that for the /bicycle_dashboard path, I'm still allowed to read and write, but anyone is allowed to read.

The gotcha is that explicitly specified directories do not inherit from their parents.  At each specified level, you must define the full auth spec.  Full details on the authorization file can be found in the Subversion Book.  That link is for the nightly, so if you've got an old version of Subversion, you might want to go grab and older version of the book as well.  The general Apache docs can be found here.

Barney and the Holy Grail

Yes, I know it's Lent, but I'm not talking about that Grail – I'm talking about Grails, the web framework for Groovy (the dynamic layer for Java).

I've been looking for a way to ditch CFML for a while now, but nothing has really hit the mark. I keep hoping that Adobe is going to release CF8.1 with Flex 3 and allow all your server-side scripting to be in ActionScript 3, but I'm not holding my breath. The CF platform is pretty nice (excepting the massive bloat), but the CFML language itself just blows ass. So I shop…

I liked SpringMVC/Spring/Hibernate when I was using it, but it's definitely enterprise-y. I've looked at Rails, but I really dislike Ruby's syntax, and ActiveRecord leaves something to be desired (like a query language). Django and Pylons (for Python) are slick seeming, but Python's whitespace-is-semantic paradigm is a big turnoff. My CFRhino "project" actually has some promise, I think, but not something I want to get into maintaining for real application development, and it's JavaScript, which means no real classes and classloading (among other things).

Enter Grails. Like it's counterparts in other languages, it's all about speed of development, but unlike the others, it's all Java. It's built on Hibernate, Spring, and SpringMVC. It compiles to bytecode. It seamlessly integrates with existing Java apps and tools (Quartz, Acegi, DWR, etc.). The paradigms it leverages were incredibly easy to jump into (granted, I've used it's backing tools before), and within a couple hours of seeing both Grails and Groovy for the first time, I had a quasi-functional blog platform.

The past week of developing a first "real" app has borne out my initial impressions to the nth degree. Groovy is a really slick layer that augments Java in just the right ways, without getting the way. Closure are a big deal, vastly superior to simple functions or Java's inner classes. You also get dynamic methods, iterators, native regular expressions, multi-line strings, ranges, a ternary operator (c'mon CF, it's not the 60's), the "elvis" operator which is the ternary operator with the middle clause omitted and the first clause used as it's value, and null-safe dot operator.

But Grails is where it's at.

class User {
  String username
  String password
  String email

  static constraints = {
    username(blank: false, unique: true, size: 4..16)
    password(blank: false, password: true, size: 4..32)
    email(email: true, blank: false)
  }
}

That whopping 11 lines of code is sufficient to create a domain entity with three fields, create validation for the fields, create a Hibernate mapping, and set up all the SpringMVC goodies for binding, form generation/population, error messages, etc.

Request processing, you ask? You have a collection of controllers (think circuits, if you know Fusebox), each of which is a collection of actions (i.e. named closures) corresponding to requests (think fuseactions). So the URL "/login/doLogout" would map to the "doLogout" closure of the "login" controller. Each action closer can redirect, render directly, or return an arbitrary data structre to be passed to a view. By default, the view is a file named to match the action, in a folder to match the controller (you can override, of course), which is written in Groovy Server Pages (GSP). Think JSP, except without the heavy dose of suck. For example, you can define taglibs using Groovy, and then use them in both your views and your controllers, as either tags or as functions.

Grails (thanks to Spring) also provides all-encompassing dependency injection for not only your services and controllers, but for your domain objects as well, which is fantastic. That was one of the larger problems I had with using Spring and Hibernate directly, and one that I never was able to solve in a very satisfactory way. It also provides controller filters which are exceptionally handy (enforcing logins, adding model data for your layouts, etc.).

And the last big piece is SiteMesh, which is the [optional] layout toolkit. You can certainly use GSP for everything; simply include your header/footer as needed and go. But SiteMesh provides basic page wrapping functionality to ease the task, along with a very powerful content dissection/assembly framework (think Fusebox's contentvariables, except managed by the view as they should be).

Finally, yes, there is pretty comprehensive scaffolding support. It doesn't blow my skirt up as much as the other stuff though, but that's probably more a reflection on the type of apps I'm usually building rather than the general utility of the feature.

In case you can't tell, I'm impressed. The Grails guys have done a fantastic job at taking some incredibly flexible and powerful tools and making them ridiculously easy to use.