Using Java Regular Expressions In CFML

While doing some regex stuff today I discovered that both ColdFusion and Railo use an external regular expression engine for the REReplace function instead of the one built into the core JRE (the java.util.regex.* classes).  I don't think I've ever had a reason to care until now, but today I was stuck.  Fortunately, since CFML strings are java.lang.String instances, you can use the replaceFirst and replaceAll methods on them.  You can also use the Pattern/Matcher classes directly, of course, but the String methods are easier and usually sufficient.

Why might you care?  Lookaround.  Say you want to replace all the single 'o' characters in the string "oh, I love cookies" with zeros.  Here's how you'd do it:

"oh, I love cookies".replaceAll("(?<!o)o(?!o)", "0") // 0h, I l0ve cookies

What the hell is that mess, you ask?  Look behind (in red) and look ahead (in blue).  Both parts are ignored until the 'o' in the middle is matched.  Then the look behind says "only if you're not preceeded with an 'o'", and the look ahead says "only if you're not followed by an 'o'."  This is a contrived example, but it illustrates.

What's the downside?  You can't do case translation with \u, \l, \U, \L and \E like you can with the CFML-native engine.  That's a really handy feature, and hopefully it'll make it into core Java at some point, but for right now it's not there.  Other than that, no real downside.

The actual stumbling block I ran into was for an April Fool's day joke: a filter that would replace content in HTML documents, but leave "important" stuff alone.  The filter takes a list of replaces to make on the content, and prefixes each regular expression with this string:

(?![^<]*</(?i)(?:textarea|script|style)(?i)\W)(?<=(?:>|^|\\G)[^<]*?)(?<!&(?:[a-zA-Z][a-zA-Z0-9]{0,25}|#[0-9]{0,25}))

before running it against the content.  That uses a negative lookahead, a positive lookbehind, and a negative lookbehind to ensure that the expression only replaces stuff that isn't nested with TEXTAREA, SCRIPT, and STYLE tags, isn't part of any HTML tag, and isn't part of an HTML Entity.  It also uses three non-capturing groups (in bold), to ensure that the prefixing of the expression with this extra stuff doesn't screw up backreference indexing.

With this filter in place, you can supply some arbitrary replaces to content and have them made on the fly, without actually breaking anything.  For example, if you wanted to replace all 'e' with '3', 'o' with '0', and 'barney' with 'The Supreme Commander', you'd configure it like this:

/e/3/i
/o/0/i
/barney/The Supreme Commander/i

Without the protection from the above prefix, you HEAD tag would become a H3AD tag, and your BODY tag would become B0DY.  Not ideal.  But with the protection, only content gets replaced, not any of the markup, so everything will still render correctly.

Completely pointless?  You bet.  An interesting experiment? That too.  In the end, I ended up shelling out to Groovy (via CFGroovy) and used the Pattern class directly along with some other Java APIs to get my work done.  Still a technique to keep in the toolbox, though.

CFGroovy is Self Executing

Tonight I finished porting the internals of CFGroovy from CFML to Groovy.  Yes, the CFGroovy core is now implemented in Groovy.  The remaining CFML code is for managing the public API (which is a CFML API and therefore must remain CFML),and for bootstrapping the Groovy core.

This architecture provides a number of benefits, primarily a huge reduction in the amount of crazy CFML-based Java interactions.  If you ever get to thinking that doing reflection with CFML wouldn't be too bad, you're wrong.  It's like pulling teeth with scissors.  That is not a typo or an inadvertant mixed metaphor.  The internal code is now far shorter and more readable, though there is still some nasty CFML in there.  Fortunately, I was able to get bootstrapping done with only no-arg constructors, so no more need to type-based constructor selection in CFML, thank god.

Moving the core down into Groovy also move one of my longer-term goals a bit closer to reasonableness.  I really want to create a persistence layer entirely in Groovy, manage it with an IoC container, and use it as a parent BeanFactory for a service layer (implemented with CFCs).  I tried a couple hacks to get this working with the 1.0 engine, and while both of them mostly worked, neither one worked all the way or was even remotely elegant.  Elegance isn't always possible, of course, but the lack of it is usually a red flag.  So I backed off until I had a better platform to approach it from.  But like I said, that goal is still a ways off.

Why You Should Care About Groovy

If you know anything about me, you probably know that I'm a big fan of Groovy.  But why?  I've never really addressed that question head on, I don't think, so I'm going to do that here (prompted by a comment from David McGuigan on my CFGroovy 1.0 post).

First, Groovy is a dynamic language for the JVM, in much the same vein as CFML, Jython or JRuby.  By "dynamic" I mean that things get figured out at runtime.  Contrast this with a static language like Java, where everything is wired together at compile time (aside from reflection).  For example, in order to reference a variable in Java, the variable has to be declared in the source and available at compile time.  With a dynamic language, the variable only has to be there when you reference it.  It doesn't have to exist before then.

Groovy is also an "essential" language (again like CFML, Python, Ruby) in that "ceremonious" constructs are minimized.  In Java, you have to do a lot of boilerplate/ceremonious coding (handling checked exceptions, types, manual decoration, etc.).  With an essential language that is minimized as much as possible.  Here's a simple example of reading a file with Groovy:

text = new File("/path/to/file.txt").text
println(text)

and in CFML:

<cfset text = fileRead("/path/to/file.txt") />
<cfoutput>#text#</cfoutput>

and here it is in Java:

import java.util.Scanner;

String text = null;
Scanner scanner = new Scanner(new File("/path/to/file.txt"));
try {
  text = scanner.useDelimiter("\\Z").next();
} finally {
  scanner.close()
}
System.out.println(text);
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;

public class JavaTest {
  public static void main(String[] args) {
    BufferedReader r = null;
    try {
      r = new BufferedReader(new FileReader("/path/to/file.txt"));
      StringBuffer sb = new StringBuffer();
      while (r.ready()) {
        sb.append(r.readLine()).append("\n");
      }
      String text = sb.toString();
      System.out.println(text);
    } catch (IOException ioe) {
      // well crap
      System.err.println("got an IOException: " + ioe);
      if (r != null) {
        try {
          r.close();
        } catch (IOException ioe2) {
          // oh well
        }
      }
    }
  }
}

Ick.

With the Scanner class (which I was not aware of until a commenter mentioned it) the Java example isn't terribly worse than the Groovy and CFML examples, so I retract my previous "ick" statement.  The Java is certainly less direct, however.

So we want to use Groovy or CFML instead of Java for this sort of thing, but why Groovy in particular?  Unlike CFML (or Jython, JRuby, etc.) Groovy is designed as something of a language extension to Java, rather than a totally separate language in it's own right.  To put that another way, Groovy builds on not just the JVM, but also on the Java language itself to a large degree.  It makes many things enormously simpler compared to Java, but using a familiar syntax.  In fact, nearly 100% of Java syntax is also valid Groovy.

That buys us several really nice features.  First, Groovy is easy to learn if you've done any Java.  Second, it means that the language constructs map back to Java constructs almost directly, which means accessing Java code from Groovy is a snap.  Finally, it means you can use Groovy to create "Java" constructs, and since Groovy is dynamic, you can do it on the fly.  If you've ever tried to leverage Java libraries from CFML, you know the pains of createObject, native arrays, and null.  Here's an example of creating a URL[] (a Java array of URLs) from comma-delimited string of filenames in CFML:

<cfset urlClazz = createObject("java", "java.net.URL").init("http://barneyb.com/").getClass() />
<cfset Array = createObject("java", "java.lang.reflect.Array") />
<cfset path = "/path/to/lib.jar,/path/to/otherlib.jar" />
<cfset urlArray = Array.newInstance(urlClazz, listLen(path)) />
<cfset i = 0 />
<cfloop list="#path#" index="item">
  <cfset Array.set(urlArray, i, createObject("java", "java.io.File").init(item).toURL()) />
  <cfset i = i + 1 />
</cfloop>

and in Groovy:

path = "/path/to/lib.jar,/path/to/otherlib.jar".tokenize(",")
urlArray = new URL[path.size]
path.eachWithIndex { it, i ->
  urlArray[i] = new File(it).toURL()
}

As you can see, Groovy's "closeness" to Java makes dealing with Java constructs enormously easier.  Of course, if you're not using Java libraries, this benefit is of minimal benefit. The Groovy example also illustrates a closure and the eachWithIndex iterator.  This is where Groovy really shines.  You can write incredibly concise code without sacrificing readability.

Groovy also allows you to create classes on the fly in your scripts, something that CFML has no way of providing.  Need a Comparator?  Or perhaps a Runnable?  You can create those inline, no separate files, no compilation, nothing.  As an example, let's say I have a List of Maps (or an array of structs in CFML) and I want to sort them by the "letter" Map/struct key.  Here's some CFML to do it:

<!--- myArray is an array of structs, each with a "letter" key --->
<cfset keys = "" />
<cfloop from="1" to="#arrayLen(myArray)#" index="i">
  <cfset keys = listAppend(keys, myArray[i].letter & "~" & i) />
</cfloop>
<cfset newArray = [] />
<cfloop list="#listSort(keys, 'textNoCase')#" index="key">
  <cfset arrayAppend(newArray, myArray[listLast(key, "~")]) />
</cfloop>
<cfset myArray = newArray />

CFML has no way of sorting an array of complex objects, so we have to fake it by building a collection of simple objects (a list of Strings, in this case), sorting that, and then reversing it back to the complex objects.  This also has a very undesirable (though non-obvious) side effect: it changes the myArray reference, not just the myArray object.  With CFML's pass-by-value semantic for arrays it's not as big a deal as it might otherwise be, but still a nasty consequence that can yield some really insidious bugs.   You can beat it by replacing the last line with this:

<cfset myArray.clear() />
<cfset myArray.addAll(newArray) />

This keeps the myArray reference intact, but requires a bit of knowledge about the Java Collections framework.

Here's the same example in Groovy:

// myArray is a List of Maps, each with a "letter" key
Collections.sort(myArray, {o1, o2 ->
  o1.letter.compareTo(o2.letter)
} as Comparator)

Here I'm using a closure (the part in blue) as a Comparator instance and using the Collections.sort method.  Without a comparator, there's no way to use this method, which is why the CFML has do the whole thing manually.  The example  requires a little knowledge of the Java Collections framework, but the readability and maintainability of the code is enormously better, so it'll cover the cost of learning very quickly.  And as you saw, in order to get the CFML code to actually do what you want (sort the array, not create a new array in sorted order) you end up having know about the Java Collections framework anyway.

Finally, Groovy also provides a rich metaprogramming environment.  I'm not going to show any code, but simply put, metaprogramming is a way of having code effect the program while it's running.  To put that another way, programming effects data, metaprogramming effects the program itself.  For example, you can add new methods to objects (or whole classes of objects) at runtime.  This is the acme for dynamic languages – being so dynamic the code itself is mutable at runtime.

What about downsides?

First, Groovy is a new language to learn; no getting around that.  You have to learn new syntax and new idioms, and keep them sorted from other languages' syntax and idioms.

Second, if you have a significant investment in another language, leveraging Groovy requires a integration features.  With Java it's simple – the Groovy compiler provides that.  I struggled with Jython integration, though I'll admit I didn't spend much time on it.  For CFML, I've invested significant effort in building CFGroovy to address this issue, but it still has limitations.

Third, Groovy's focus on dynamic execution and metaprogramming has performance implications for certain types of operations.  If you're writing a mathematics library, for example, Groovy is a horrible choice.  As such, unless you development needs are constrained to high level concepts, Groovy is probably not the best choice for a one-size-fits-all language.  Groovy is not unique in this way, of course.  CFML, Jython and JRuby all seem to have lower performance penalties, but at the expense of "dirtier" Java integration and less comprehensive dynamic capabilities.  Different language / different focus.

In conclusion, Groovy is not a panacea.  It's nothing more than a potential tool to have in your developer toolbox.  However, when integrated into an existing JVM-based environment, or as a foundation language for new development, it can be an incredibly powerful tool.

New CFGoovy Demo App

This afternoon I threw together a little blog demo app for CFGroovy.  It's really simple, but it illustrates some more advanced usage.  In particular:

  • The app uses ColdSpring to wire everything together and obtain transaction management with AOP, instead of having to code your transactions manually.
  • Entity relationships (as well as composition) with both direct and transient querying.  The two means of comment ordering is of particular interest.
  • More complex program flow using a mix of CFML and Groovy.
  • FB3Lite for the front controller with a bit of neat view-layer stuff

The app is still definitely demo-ware, not anything even close to production worthy, but it's a lot closer to the real world than the existing demos.

Building the app also illustrated a minor issue with the core CFGroovy runtime: the 'params' binding could be altered within Groovy code, but the changes don't reflect back in the CFML context, because it's a synthesized structure.  Because of this, I changed it to be immutable (using Collections.unmodifiableMap) so you'll get a fast failure if you attempt to modify it from within Groovy.  This is theoretically a backwards-incompatible change if you used the 'params' binding to pass data between multiple Groovy scripts in the same request.  However, I'm going to ignore that potential use case.  : )

Running CFGroovy on in a Hibernate-Aware Environment

In my last post I mentioned the issue with using CFGroovy's Hibernate when Hibernate is already loaded by the app server, such as the case with JBoss 4.2+ and a certain unreleased CFML runtime (cough … Centaur … cough).  The gist of it is that Hibernate appears to create some static references to itself that circumvent the RootLoader that CFGroovy uses to isolate it's Hibernate environment.  These static references get used because they already exist, so the classloader doesn't bother to load the classes from the RootLoader.  As such Class comparisons fail within Hibernate's class hierarchy, because Class equality is both on the Class name and the ClassLoader used to load it.

Fortunately, this problem is mostly a "getting started" issue.  I can't think of many places where you'd want to use CFGroovy's Hibernate and some other Hibernate environment in the same webapp (a partial port or an app being the one I can think of).  As such, it's very unlikely you actually need multiple copies of Hibernate in your webapp classpath.  And that means that you can get CFGroovy's Hibernate support to work simply by removing some unneeded JAR files.

For the unreleased CFML runtime that bundles Hibernate, simply removing /WEB-INF/cfusion/lib/hibernate3.jar will allow CFGroovy's Hibernate support to work.  This means you can't use that runtime's Hibernate-backed features, but if you're using CFGroovy's Hibernate support, you probably don't need it.

For JBoss, Patrick Santora did some digging and had this to say:

Removed a few jars from within the jboss node I have cfgroovy running in "{node}\lib":
  antlr.jar
  hibernate3.jar
  hibernate-annotations.jar
  hibernate-entitymanager.jar

Removed a few jars from within the cfgroovy hibernate_lib node:
  cglib-nodep-2.1_3.jar
  commons-collections.jar
  commons-httpclient.jar
  commons-logging.jar
  dom4j-1.6.1.jar
  ejb3-persistence.jar

Start the jboss node and done.

The removed jboss jar's were in the cfgroovy folder and vice versa so I
just needed to do a comparison. Removing those conflicts fixed the
problem.

I am running on jboss 4.3.2 and Railo 3 war (community edition license)

I don't have a JBoss environment to confirm this on, but he said he's now up and running.

CFGroovy 1.x Features

It was a big weekend for CFGroovy.  In addition to the 1.0 release, I started doing some work for the 1.x series (which is available in the trunk).  There are several significant changes to  the engine:

  • Groovy has been upgraded to 1.6.0 from the previous 1.5.6 version.
  • You can specify a custom Groovy JAR for your CFGroovy.cfc instance to use if you don't like the default one.
  • You no longer have to manually copy the Groovy JAR into /WEB-INF/lib as part of installing CFGroovy, the engine will bootstrap the JAR itself.
  • Portions of the engine are now implemented in Groovy instead of CFML.
  • Bootstrapping a new runtime is noticeably slower, so make sure you're managing your runtime and not recreating every request.

The demo app has also gotten some love.  It now does a better job with detecting whether it thinks it can run on a given environment, and is a bit smarter about helping the user get up and running.  I've also created a slightly more complex Hibernate example.  I wanted to package a "real" app as a demo, but I couldn't get something from the real world that was both simple and complex enough.

On the negative side, I did a bunch of digging, and it appears that having Hibernate already in the server's classpath prevents CFGroovy from correctly bootstrapping it's internal version.  The problem seems to reside in the way Hibernate is implemented internally, so not something I can fix.  Note that this only affects Hibernate – the core CFGroovy framework will happily run your Groovy code regardless of Hibernate conflicts.

This issue explains why CFGroovy fails on many versions of JBoss, as well as CFML runtimes that bundle Hibernate (cough … Centaur … cough).  The "solution" is to remove the Hibernate JARs from the webapps's classpath, though this obviously means the webapp can't use Hibernate except through CFGroovy.  However, I'd expect that the number of "real" applications using Hibernate through multiple channels is virtually zero, so it's more of a PITA than a showstopper.  I plan to do a bit more research on this, but I'm not optimistic of finding a solution.

CFGroovy 1.0!

After a lengthy burn in period, I've officially released CFGroovy 1.0.  It is identical to CFGroovy 1.0RC3, so if you're running that version there is no need to upgrade.  You can get the 1.0 engine binaries, 1.0 demo binaries (including the engine), view the 1.0 tag in Subversion (engine or demo), or visit the project page for more detailed info.

CFGroovy, for those unfamiliar with it, is a Groovy integration package for Railo and ColdFusion allowing you to leverage Groovy from within CFML applications.  It also provides a plugin for enabling Hibernate support, allowing you to use the 800-pound gorilla of ORM tools in a fully dynamic environment (no compilation, no container restarts).

The Cookie Militia

Saw this on pictureisunrelated.com and had to share:

wtf_pics-cookie-militia

Open Pandora

Based on several suggestions after my last post, I gave OpenPandora another try.  So far it's been pretty stable and handles the new player layout correctly.  Nice to have my keyboard shortcuts back, though it does occasionally throw up "do you want to debug this error" dialogs (from Visual Studio, I'm guessing?).  But working well enough to keep at it for now.

Pandora Bookmarklet

Ever since Pandora prevented accessing the miniplayer directly a few months ago, I've been using a little bookmarklet to fake it.  Set http://pandora.com/ as my IE homepage (about the only thing I use IE for) and put the bookmarklet in my links bar to click immediately after launch.  Net result: a perfectly sized window, hiding all the non-player crap.

javascript:window.resizeTo(641,416);window.scrollTo(58,130);