HTTP is an API

Ray Camden posted an interesting article over on InsideRIA about expanding short urls using jQuery and ColdFusion.  After reading the article, I thought he was overcomplicating things somewhat by relying on the url shortening services' APIs to do the lookups.  Yes, that's what APIs are for, but for this case, HTTP happens to be a perfectly sufficient API, it's consistent across services, it requires no special interface code, and it's crazy simple.  I commented to that effect, and then decided I ought to put my  money where my mouth is.

To that end, I wrote a small demo app that uses several different bits of tech to get the job done (render an ugly page with auto-expanding URLs).  It uses JSON/P to load some tweets containing shortened URLs from the Twitter Search API, writes them to the DOM with jQuery after wrapping the URLs with A tags (the part Ray did manually), and then wires a jQuery mouseover listener to trigger expansion (virtually identical to Ray's).  Of course, the server side is what I was actually interested in, so here it is:

<cfhttp method="get"
  url="#attributes.expand#"
  redirect="false"
  result="cfhttp" />
<cfoutput>#cfhttp.responseHeader["Location"]#</cfoutput>

Pretty simple, eh?  Both a rapier and a howitzer can kill a man; picking the right one is important.

Why does this work?  Because every service is just doing a HTTP 301 with a Location header from the short URL to the full URL.  Some of them expose APIs for creating and querying the short URLs, but for this task, we don't actually care, we only care about the Location header.  The only special processing require is for preview.tinyurl.com links; I simply convert them to normal tinyurl.com links so they do the "right" thing.

As always, there is full source (all 55 lines of it, in one CFM template) displayed beneath the app itself should you wish to see all the details.  The JavaScript stuff is far more complex than needed for a demo app, but I wanted to do a little experimentation on the client side as well as the server-side stuff.

Spring 2 "scope" goodness for ColdSpring

In Spring 1.2 (and ColdSpring, which emulates it), you have the "singleton" attribute, which was a boolean flag for whether a bean is a singleton (the default) or a prototype (instantiated afresh for every getBean call).  If you've used Spring 2.0+, you've probably come across the "scope" attribute, which supersedes the "singleton" attribute, and allows singleton, prototype, request, and session lifecycles.

In most cases, singleton and prototype are all you need, but it's occasionally useful to scope a bean to a request or a session.  But what does that mean, exactly?

With a prototype bean, you get a new instance back every time you call getBean.  With a singleton bean, you get the same instance back every time you call getBean for the life of the BeanFactory.  With a request bean, you get the same instance back every time you call getBean for the life of the request.  You can probably guess what happens for a session bean.

Since ColdSpring emulates the Spring 1.2 way of doing things, you don't have access to request- or session-scoped beans, unless you manually load them into the appropriate scope and always reference them from there, instead of the BeanFactory.  That's a mess, so I wrote a simple wrapper bean called BeanScopeCache to provide this functionality in bean form.

To use it, you define your target bean as you normally do (ensuring you set singleton="false"):

<bean id="requestConfig_target"
    class="com.barneyb.cache.requestconfig"
    singleton="false">
  <constructor-arg name="contentcache"><ref bean="contentCache" /></constructor-arg>
  <constructor-arg name="publicUrl"><value>${publicUrl}</value></constructor-arg>
</bean>

and then create a BeanScopeCache as such:

<bean id="requestConfig" class="com.mentor.util.BeanScopeCache">
  <constructor-arg name="targetBeanName">
    <value>requestConfig_target</value>
  </constructor-arg>
  <constructor-arg name="scope">
    <value>request</value>
  </constructor-arg>
</bean>

In this case, my target bean is named "contentCacheRequestConfigTarget" and through BeanScopeCache it'll be tied to the request scope.  Note that you can't use a nested bean, because BeanScopeCache needs a bean name not an injected instance.  Here's the source:

<cfcomponent output="false" extends="coldspring.beans.factory.FactoryBean">

  <cfset SCOPE_KEY = "__com_mentor_util_bean_scope_factory_cache_key__" />

  <cffunction name="init" access="public" output="false" returntype="BeanScopeCache">
    <cfargument name="targetBeanName" type="string" required="true" />
    <cfargument name="scope" type="string" required="true" />
    <cfargument name="bindToBeanFactory" type="string" default="false"
      hint="Whether to bind instances to this bean factory.  Ignored for singletons." />
    <cfset variables.targetBeanName = targetBeanName />
    <cfset variables.scope = scope />
    <cfset variables.bindToBeanFactory = bindToBeanFactory />
    <cfreturn this />
  </cffunction>

  <cffunction name="setBeanFactory" access="public" output="false" returntype="void">
    <cfargument name="beanFactory" type="coldspring.beans.BeanFactory" required="true" />
    <cfset variables.beanFactory = beanFactory />
    <cfset INSTANCE_ID = createObject("java", "java.lang.System").identityHashCode(beanFactory) />
  </cffunction>

  <cffunction name="isSingleton" access="public" output="false" returntype="boolean">
    <cfreturn scope EQ "singleton" />
  </cffunction>

  <cffunction name="getObject" access="public" output="false" returntype="any">
    <cfset var key = SCOPE_KEY & targetBeanName />
    <cfset var container = "" />
    <cfif bindToBeanFactory>
      <cfset key &= INSTANCE_ID />
    </cfif>
    <cfswitch expression="#scope#">
      <cfcase value="prototype">
        <cfset container = {} />
      </cfcase>
      <cfcase value="request">
        <cfset container = request />
      </cfcase>
      <cfcase value="session">
        <cfset container = session />
      </cfcase>
      <cfcase value="singleton">
        <cfset container = variables />
      </cfcase>
      <cfdefaultcase>
        <cfthrow type="IllegalArgumentException"
          message="The '#scope#' scope is not supported." />
      </cfdefaultcase>
    </cfswitch>
    <cfif NOT structKeyExists(container, key)>
      <cfset container[key] = beanFactory.getBean(targetBeanName) />
    </cfif>
    <cfreturn container[key] />
  </cffunction>

</cfcomponent>

Project Euler Test Harness

Project Euler is a collection of mathematics/computer science problems, as you probably already know.  I've solved almost 50 of them so far, and I've developd a collection of utilities to make my job easier.  Some of them (factorization routines, prime generators, etc.) I'm not going to share as they are fundamental to solving certain problems.  However, I will share my test harness, as it's probably generally useful, and it contains no "secrets".

Why build a custom harness?  Well, I originally considered using JUnit (or whatever) as a simple runner, but since there isn't anything to test, it didn't feel right.  It also doesn't provide a good way to get timing information for the runs.  And since I was originally using strictly Groovy, I didn't want to impose any more "ceremony" than I needed to.  I've since started doing some problems in Java simply for the performance (as well as converting some of those "secret" routines), and have generalized my harness.

First, here are the file templates for Groovy:

import util.Runner

new Runner() {
  // put some code here, eh?
}

and for Java:

import util.Runner;
import util.java.Solver;

public class BruteForce {
  public static void main(String[] args) {
    new Runner(new Solver() {
      public Object solve() {
        // put some code here, eh?
        return null;
      }
    });
  }
}

As expected the Java version requires quite a bit more ceremony, even though the semantics of the two templates are identical.  The Solver interface (used for Java) is as follows:

package util.java;

public interface Solver {
   public Object solve();
}

Finally, the Runner class, which is implemented in Groovy.  As you can see from the templates, it accepts either a Solver instance or a Groovy closure, and treats the two in exactly the same way, excepting that it picks between Solver.solve() and Closure.call() based on the type.

package util

import util.java.Solver

class Runner {

  private static ThreadLocal timer = new InheritableThreadLocal()
  private static int timerCount = 0

  def Runner(solver) {
    this(true, solver)
  }

  def Runner(doIt, solver) {
    if (! doIt) {
      return
    }
    startTimer()
    // this will send an update every five or so seconds
    volatile def updateThread
    updateThread = Thread.startDaemon {
      while (true) {
        Thread.sleep(5000)
        if (updateThread == null) {
          break // after the sleep, before the log
        }
        log("still executing...")
      }
    }
    try {
      log("starting...")
      def result;
      if (solver instanceof Solver) {    // the closure/Solver switch
        result = solver.solve()          //  |
      } else { // better be a Closure    //  |
        result = solver.call()           //  |
      }                                  //  V
      stopTimer()
      if (result instanceof String) {
        try {
          result = new BigInteger(result)
        } catch (e) {
          // oh well
        }
      }
      if (result instanceof Integer || result instanceof Long || result instanceof BigInteger) {
        log("result: $result")
      } else {
        def s = "" + result
        if (s.length() > 100) {
          s = s[0..<97] + "..."
        }
        log("result was not numeric: $s", System.err)
      }
    } finally {
      updateThread = null
    }
  }

  def elapsed() {
    get().elapsed
  }

  def formatElapsed(e) {
    def seconds = e.intdiv(1000)
    def millis = e % 1000
    def s = ""
    if (seconds >= 60) {
      s += seconds.intdiv(60) + ":"
      seconds = seconds % 60
    }
    if (s.length() > 0 && seconds < 10) {
      s += "0$seconds"
    } else {
      s += seconds
    }
    if (millis < 10) {
      s += ".00$millis"
    } else if (millis < 100) {
      s += ".0$millis"
    } else {
      s += ".$millis"
    }
    s
  }

  def log(Object msg, out=System.out) {
    def t = get()
    out.println "[${t.index}: ${formatElapsed(t.elapsed)}] $msg"
  }

  private startTimer() {
    Runner.timer.set(new TimerData(++timerCount))
  }

  private stopTimer() {
    get().stop()
  }

  private get() {
    def t = Runner.timer.get()
    if (! t) {
      throw new IllegalStateException("No timer has been started...")
    }
    t
  }
}

class TimerData {
  def index
  def startDate
  def stopDate

  def TimerData(threadIndex) {
    index = threadIndex
    startDate = new Date()
  }

  def stop() {
    stopDate = new Date()
  }

  def getElapsed() {
    if (running) {
      return new Date().time - startDate.time
    } else {
      return stopDate.time - startDate.time
    }
  }

  boolean isRunning() {
    stopDate == null
  }
}

There's a lot of stuff going on there, but primarily it's setting up a timing framework, a background thread to tell you it's still running every 5 seconds, and doing some magic with the returned solution.  As you can see, the constructor takes an optional first parameter for whether to execute.  Useful if you have multiple Runners in a single script/class, but only want to run some of them. https://maxinternet.biz/best-gas-grills-under-300

It wasn't my intent, but building this out was quite educational in the way the Java/Groovy interplay works.  Write Groovy for whatever you can, and write Java for the bits that need it.  As you can see from the templates, Java requires a lot more work, but it has benefits.  I was quite pleased to be able to use my Groovy harness (which leverages all kinds of Groovy goodness) for my Java solutions.  I originally figured I'd have to port it to Java to use it with both languages, but not the case.

A Groovy Letdown

I'm a huge fan of Groovy, but it's major downside drove over me like a bus this evening.  What's that, you say?  Performance, of course.  Runtime "stuff" necessarily happens at runtime, and therefore affects performance.  Does this matter in most cases?  No, absolutely not.  In general it's far cheaper to buy hardware than to employ developers, so ease of development and maintenance almost always trumps raw performance.  This is why we use high level languages as opposed to writing everything in assembler.

As you probably expect, I was working on Project Euler.  Problem 39, to be specific.  The algorithm is simple, and if you pay attention, you don't even have to do the full scan to get the answer.  However, I couldn't get within the minute boundary with my algorithm.  I did a whole bunch of optimizations to ensure I was doing absolutely as little work as possible (including stopping the search at the known answer!) and still couldn't even come close to the minute deadline.

On a whim, I took my brutish, unoptimized version (because the syntax was less Groovy-ish) and dropped it into a Java class.  I had to add a pile of semicolons and variable types, insert a couple imports, and convert a GString, but the rest was untouched.  Full scan (from 1-1000) in 172 milliseconds.  Needless to say I was disappointed.

Am I going to abandon Groovy for Java?  Hell no.  But definitely going to consider reimplementing some of my hard-core subroutines in Java.  Just another case of knowing your tools so you can pick the right one for the job.  And just to be clear, I have no expectation that this sort of performance difference is in any way indicative of Groovy's general performance.  In fact, I find Groovy to be ridiculously fast, in general, because I'm typically doing far higher-level operations (like using Hibernate).

One interesting point was that copy and pasting the Java version into a Groovy file and running it (with all the semicolons and explicit variable typings) didn't affect the Groovy performance much.  That surprised me.  Apparently the Groovy compiler treats 'int' as 'Integer' or something similar.  I've never looked into it very deeply, but worth mentioning I thought.

Also interesting is that CFSCRIPT on CF 8.0.1 turns in times that are slightly faster than Groovy running through CFGroovy on the same page.  I would have expected the reverse since CF uses java.lang.Double internally, but it seems the lack of unboxing/reboxing makes up for the floating point calculations.

My Dinner

Like most things, cooking is usually better done simply.

  • 5 rashers of thick bacon
  • 1 large sweet onion
  • 1/2 green pepper
  • 2 stalks celery
  • 1 still-warm baguette
  • 1 box rice pilaf (yeah, I know)

Put on some music (Rachmaninoff's Piano Concerto #2 and Rhapsody on a Theme of Paganini for me), and pour a glass of wine (but not the stuff I had – oof).

Cook the bacon slowly until just shy of done, and certainly not crisp.  Stash in warm oven on some paper towels on the plate you're going to eat with.  Start rice.  Drain all but a few tablespoons of the bacon drippings and add the onion, sliced radially and reduce the heat a smidge.  Season.  Make sure you scrape the bottom of the pan as the onion cooks to get all the bacon debris off the pan and onto the onions (i.e., deglaze with the onion).  Looking for a bit more than a sweat, but well short of saute.

When the onions start getting soft, add the green pepper (jullianned) and the celery (cut on the bias) and reduce the heat another smidge.  Season.  While that cooks, slice your baguette.  The rice and veggies should be done about the same time.  Get your nice warm plate and bacon (which should now be perfectly done) from the oven, toss the paper towels and dish the veggies and rice up next to the bacon.  Grab your bread, ensure your glass is topped off, and enjoy.

Elapsed time from arriving home to sitting down to eat: ~25 minutes.  If your cooking for more than one (which I was not), you'll want to increase the ingredient amounts.  Yes, I eat a lot.  No, there isn't any dairy.

Hiking in the Gorge

Sunday is usually the day I get up and do laundry, because the only other time I have to do it is while the kids are sleeping.  They probably don't care, but laundry at bedtime bugs the hell out of me, so I try not to do it.  Today, however, was going to be different.  I got up and decided I was going to go get a book to read, and then come home and have breakfast and do laundry.  If you knew me, you may have thought my middle name was "Emery", but it's actually "Daring."

Off I went to the bookstore.  Drove right through Beaverton.  Turn on Burnside and drove right through Portland.  Stopped to help a guy push his truck out of the road, and then kept driving right through Troutdale to where Stark turns into "The Historic Hwy".  I seemed to have missed the bookstore, but no matter because I soon arived at a sign that says "narrow windy road next 15 miles".  I was a bit sad I wasn't on my bike, but I like driving too, so not too disappointed.   I naively assumed that the second word meant "the road twists and turns a lot", and while that was true, it also meant "the wind shear is insane on this road."  Between that and all the rocks and stuff on the roads, I was really glad to have four wheels by the time I got to the bottom.

About a quarter of the way down from the heights I came to this strange little round building.   Don't know what it was, or why it's there, but if you've driven up the Columbia from Portland, you've seen it.  It's out on the tip of this bluff, and the view is to die for.  Of course, it was blowing and raining so hard it was difficult to see much detail, but whatever.

At the bottom of the hill I came to a little parking lot with a sign that said Latourelle Falls, and I figured I'd go look.  So off I set in the wind and rain with my motorcycle jacket, a baseball cap, some nice light colored pants, and my exceptionally stylish ripped skateboarder shoes.  You'd have thought I was going to a bookstore or something.  The trail up and around was magnificent, and while well kept, obviously not well travelled.  The falls were quite impressive, and happened upon a couple sharing a kiss at the bottom.  On the way down (the other side), I found the Dr. Seuss tree, and then it was off along the History Hwy again.

Next wide spot (that wasn't someone's house) was Bridal Veil Falls.  Much shorter trail there, and the falls were shorter, but just as impressive in their own way.  Larger water volume, and some nice rock formations for the water to play with.

The last spot I drove to was Wahkeena Falls (perhaps 3 miles from the first).  The map there indicated that after climbing up you could circle around to come down on Multnomah Falls from above, and that seemed like a good plan to avoid driving any more.  This hike was the best.  Took about three and a half hours, I'd wager (compared with perhaps an hour and half for the other two), and elevation gain like you wouldn't believe.  Aside from the two main falls (Wahkeena and Multnomah), there were countless little falls along both creeks, along with some really cool rock formations.  I also found a lot of snow, which surprised me since it hasn't been that cold of late, but between elevation and tree cover to dense for the sun to penetrate there was quite a lot on the ground in places.

For the day I figured about 5 hours of hiking, probably 8-10 miles total, and what felt like about the same in elevation change.  Not true, of course, but hard to judge.  I'd guess the high point was probably 1500-1800 feet above the fall bases (2.5-3 times Multnomah Falls' height).  Someone probably knows.  Like one of those people who procure a map and hiking boots and pack some water and snacks before wondering off into the woods for a few hours.  ;)

A glorious day, to be sure, even with the wind and rain.  Well prepared I was not, but my "default" garb of my winter motorcycle jacket, a nice wool baseball cap, and some lightly insulated gloves served admirably to keep me dry and comfortable.  Even my skater kicks did well enough to keep my feet dry and happy, despite providing virtually zero traction when it got slippery.  My knees had more than their fill of going downhill at the end, but I survived.  I'll be walking gingerly down the stairs for a few days, but hopefully no more.

But it wasn't over yet.  By this time, as you might imagine, I was a bit hungry and thirsty since I was still waiting to have breakfast while I did my laundry.  There was a water fountain at the bottom of Multnomah falls, which addressed the latter, but not the former.  So I did was any self-respecting person would do: called a couple friends I hadn't seen in a while and went to Chang's where we gorged ourselves  And then to Krispy Kreme, because it's there.  And now I'm home, and I'm doing laundry while the kids (and soon to be me) sleep.

Apache HTTPD's mod_proxy_balancer

Last year we bought a Barracuda web firewall/load balancer appliance at work and have had nothing but problems with it.  Even the support guys from Barracude were unable to solve our dropped connection problems.  Needless to say, rather disappointing for a $15K device.

We're also in the process of moving off of Akamai to another CDN provider that doesn't provide all the non-CDN stuff that Akamai does (edge gzip, domain splitting, etc.).

Instead of both Barracude and Akamai we're using Apache HTTPD to do load balancing (with mod_proxy_balancer and mod_proxy_ajp), domain splitting (with mod_proxy_http), and SSL offloading (with mod_ssl).  We're going live tomorrow, but to this point everything has been remarkably smooth.  Our first load balancing set up was a little bit flawed because we were using HTTP proxying instead of AJP proxying which caused some issues with CFC-generated WSDL (because the ports showed the backend server instead of the public server).  AJP solves that problem.

I'd never used mod_proxy_balancer before, and I'm pretty impressed.  The documentation could use a little work, but the functionality is great.  There are no frills anywhere: the admin UI is a couple tables and a form (no CSS), monitoring is another HTML table, and the configuration is a couple simple directives.

At my last job we did everything with ldirectord, which is a layer 4 balancer for linux.  It worked great, but we needed a layer 7 balancer in this case for sticky sessions and such.  Further, since all our production servers are currently Windows (ick), but we're in the processes of deploying redundant linux load balancers, we wanted something that we could use on both platforms.

So if you're looking for simple layer 7 HTTP load balancing, highly recommend taking a look at Apache.  And, of course, you gain access to all the other Apache goodness, mod_rewrite in particular.  You have to be willing to edit some text files (no wizard interface), but they're reasonably straightforward, and the flexibility is fantastic.

Public Project Euler Dashboards

Project Euler doesn't support any kind of public stats for users, which is kind of lame I thought.  I posted about it on the Project forum and got no response, so I built my own: http://barneyb.com/projecteuler/.  The code is a total hack, faking a login to grab the protected profile page and then doing some string ripping, an xmlParse, and a few xmlSearches on the returned HTML to extract the relevant data.  However, it works quite well, and with how stable their site's backing code seems to be, I'm not terribly concerned.

The first version just showed my stats (it had my tokens hard coded).  The current version has the ability to create a dashboard for any user via the new link at the bottom of the page.  You have to supply either your credentials or your browser cookies so the server can get your data, but there isn't any server-side storage of anything.  The URL token that identifies the dashboard as yours also carries the tokens needed to fetch data from Project Euler.  If needed, they're extracted at runtime and used.

Whether anyone will ever use it, who knows, but it was an interesting set of problems to deal with.  Also be good to clean up the URLs a bit, but didn't figure it was worth it at this point.

Google Bar Charts

I just learned today from Joshua that Google Charts added an option to automatically compute bar widths and gaps for bar (or column) charts.  Doing it manually is always a total PITA, and I've intentionally chosen line chart where a bar chart was a better fit for that precise reason.  Now, however, you just set the 'chbh' parameter to 'a' and you're done.  Here's an example:

http://chart.apis.google.com/chart?
cht=bvg&chbh=a&chd=s:vttusty&chs=500x300
&chxt=x,y&chxl=0:|Sun|Mon|Tue|Wed|Thu|Fri|Sat|1:|0|2|4|6|8|10|12

For the curious among you, that's the average number of hours I spend in bed for each day of the week.  Along with sleep, that typically includes some sudoku in the evening and checking my email in the morning.

Prime Factorization

In my ongoing rampage through the Project Euler problems I've needed a prime factorization routine several times, and figured I'd share it as a complement to the sieve I shared last week.  Implemented in Groovy, of course.  Typing is a little strange on first glimpse because Groovy uses Integer by default which only supports values up to 2.1 billion.  So there are a number of explicit BigInteger types/casts.

def factor(BigInteger num) {
  def factors = []
  while (num % 2 == 0) {
    factors.push(2)
    num = (BigInteger)(num / 2)
  }
  BigInteger end = floor(sqrt(num))
  for (def currFactor = 3; num > 1 && currFactor <= end; currFactor += 2) {
    if (num % currFactor == 0) {
      while (num % currFactor == 0) {
        factors.push(currFactor)
        num = (BigInteger)(num /currFactor)
      }
      end = (BigInteger) floor(sqrt(num))
    }
  }
  if (num != 1 || factors.size() == 0) {
    factors.push(num)
  }
  factors
}

The algorithm is pretty simple: loop through all odd numbers and see if they're a factor of the target number.  There's a special case at the top for two (the only even prime), which then lets the loop skip all the even numbers (effectively cutting it's work in half).  There's also a special case at the bottom for ones.  If you attempt to factor one or a prime number, you'll get that number back as the only item in the list, otherwise you'll get only prime factors (and no, one is not prime).  Note that the list can contain duplicates; the method returns all prime factors, not just distinct ones.

Why does this work?  Because by the time the loop gets to each number all of it's factors have been processed, so it can't be both a factor of the target number and composite (non-prime).  So if it's a factor, it has to be prime as well.

For those of you who follow me on Twitter (@barneyb), this is not the crappy algorithm I first implemented.  This is the good one.  My first attempt was an "inversion" of the sieve algorithm which resulted in huge amounts of unneeded work and memory consumption because of the data structures involved.  The second algorithm is virtually identical, just the data structures are different (enormously simpler).  I wasn't really that far off with the first attempt, I just didn't do a good job of distilling the problem down to it's essence, so I was doing a lot of extra work (three-orders-of-magnitude extra).