Caching with mod_rewrite? What? I'll admit it's a slightly misleading title; the cache is actually a disk cache, but mod_rewrite is where the magic happens. Bear with me for a moment…
Most content on the web is fairly static. Some of it changes every few minutes, some changes every few hours, some changes a few times a month, and the vast majority of it changes approximately never. However, a large percentage of it is generated dynamically, every request. Maybe it's news articles, maybe it's thumbnails for images/pdfs/videos, maybe it's RSS feeds, but identical content is dynamically generated over and over again. Huge waste of resources.
On the flip side, you can use pre-generation to build stuff ahead of time so you can serve everything statically. However, that can be ridiculously expensive as well. For example, my blog has several hundred (if not thousands) distinct feeds available on it. The main one (listing posts), one per category (posts), one per author (posts), the main comment feed (listing comments), and one per post (comments). Each of those is available in RSS 2.0, Atom 0.3, and RSS 0.92 formats. Pregenerating those all the time is silly, because the vast majority of them will never be accessed, let alone frequently.
Ideally, we'd be able to generate these resources dynamically, on demand, but then keep the output around to serve back statically for subsequent requests. This saves us the expense of pregenerating lots of stuff that will never be accessed, but gives us the speed of static access after the first request.
Duh, Barney, what's your point?
My points is that while this is, in a conceptual mindset, the obvious solution, it's ridiculously trivial to implement. It'll take longer to read this post than to set it up. As such, there's no excuse for being resource constrained on non-user-specific resources, even though this seems to be a really common complaint.
Here's a more concrete example. Say I host photo galleries, allowing people to upload their full-size images, and I provide several views of the galleries with appropriate thumbnails. Those pages are littered with things like this:
<img src="/gen_tn.cfm?id=12345&width=100&height=100" />
This is great, because I can create arbitrarily sized thumbnails without having to go back and regenerate them for all existing photos. That's handy when I create a new layout and realize I want 125×125 thumbnails instead of 100×100, and then want to use 250×250 for the 'featured' section. But I'm generating the thumbnails dynamically every request, which is a waste. And adding caching in gen_tn.cfm is the wrong answer. : )
First, let's change the URLs in the pages to look like this:
<img src="/tn/p12345-100x100.jpg" />
Same information as before, just packaged differently. Then I'll use the following RewriteRule to (internally) turn it back into the original request to gen_tn.cfm (effectively a no-op):
RewriteRule ^/tn/p([0-9]+)-([0-9]+)x([0-9]+)\.jpg$ /gen_tn.cfm?id=$1&width=$2&height=$3 [PT,L]
Lipstick on a pig, you might say, and you'd be almost right. We now have normal-looking URLs for our thumbnails (lipstick), but they're still dynamically generated every request (on a pig). This abstraction, however, is incredibly powerful. Lets add a RewriteCond in front of that rule real quick:
RewriteCond %{REQUEST_FILENAME} !-s
RewriteRule ^/tn/p([0-9]+)-([0-9]+)x([0-9]+)\.jpg$ /gen_tn.cfm?id=$1&width=$2&height=$3 [PT,L]
That says to only do the RewriteRule if the requested file doesn't exist or is zero length ('-s' says a regular file with non-zero length, the '!' negates it). Next step is to create the 'tn' directory in your web root and ensure it's writable by your application server. You can probably see where I'm going with this…
The final step is to tweak gen_tn.cfm slightly. Currently, it creates the thumbnail and serves it back to the client. We need to change it so that before serving it back, it writes it to disk in that new 'tn' directory, using the appropriate filename. Once that's done, send it to the client as usual. The next time the thumbnail is requested, Apache will hit the RewriteRule, but the RewriteCond will not match (because the file exists and has length). As such, it won't be rewritten to gen_tn.cfm, and will instead be served statically directly from disk bypassing the application server completely.
With those couple simple changes, you suddenly have a ridiculously effective caching mechanism in place.
What about changes to the source, though? You realize one of your photos (#12345) was miscropped, so you fix it and upload a new version, but you want your thumbnails to be regenerated too. Fortunately, flushing the cache is as simple as deleting all files in 'tn' that match '*p12345*.jpg'.
Same thing goes for deletions. If you decide you just want to remove photo #12345 completely and want to remove the thumbnails too, run the same deletion of '*p12345*.jpg' from the 'tn' directory. Or if you stop using 100 pixel thumbnails (like when I switched to 125×125 a few paragraphs ago), you can just delete '*100×100*.jpg'.
Because you're using the filenames as an index of sorts, it means you have to name your files carefully. The filename needs to contain not only everything to uniquely specify the file (photo ID, width, and height in this example), but also everything that you might want to use for clearing the cache. For example, if you need the ability to clear based on gallery ID you'd need to change the URL to '/tn/g123-p12345-125×125.jpg' or something. In this case the gallery ID isn't needed for unique specification, only for flush selection.
The net of this is that you can hit that sweet spot: avoiding any extra work generating resources that aren't accessed, and never generating the same resource more than once. Obviously the first request to a resource has to wait for generation, so this technique isn't suitable for all use cases, but it covers a huge swath of them. It's especially well suited to situations where you have a large number of resources and have either relatively light usage across them and/or need the ability to change the derived resources' specifications (e.g. new thumbnail dimensions or new XML feed formats).
As you'd imagine, PotD (NSFW, OMM) uses this technique extensively for several classes of thumbnails as well as RSS feeds. It also does some pre-generation where the first-request delay is unacceptable. I also used this to great effect at my previous employer's for front-end caching of CMS-generated HTML pages. We handled hundreds of millions of pages per day on a pair of single-P4 servers with 1GB of RAM each, with an average cache life of between two and four hours.
One significant gotcha is that you only get full-request caching with this technique. I.e. you can't cache portions of a request's response, because it's either fully dynamic (the first request) or fully static (subsequent requests). For example, most blogs have a "remember me" feature so you don't have enter you information each time you want to comment. In order to beat this, you need some sort of two-phase generation where the cache happens between the phases, and that means you have to have your application running "above" the cache. Ajax can be used as the second phase, but that's a disaster waiting to happen, if you ask me.