There was a post on CF-Talk regarding specifics of locking, and I thought I'd create a summary (though I'm not Ben Forta, who was requested by name), along with some ideas for making the job simpler.
In CF5 and less, CFLOCK was required for shared memory access, as well as race conditions. With CFMX, CFLOCK is only required for race conditions, because the underlying Java runtime takes care of the shared memory access issues. Great, but what is a race condition?
Race Condition: A situation where two different requests are manipulating a single resource, and it's possible for the two requests to step on each other's toes.
The easiest example is something like these two queries (which deducts an item from the inventory of a given product):
<cfquery datasource="#request.dsn#" name="get"> SELECT inventory FROM product WHERE productID = 1 </cfquery> <cfquery datasource="#request.dsn#"> UPDATE product SET inventory = #get.inventory# - 1, inventoryUpdate = now() WHERE productID = 1 </cfquery>
If two separate requests start this process perfectly in parallel they'll both get the same result from the first query (say 4), and then when they run the second query, they'll both update the inventory to 3. This is clearly NOT the correct result (it should be 2).
We can solve this problem several ways, but we'll use a CFLOCK statement to do it:
<cflock name="inventoryupdate" type="exclusive" timeout="10"> <cfquery datasource="#request.dsn#" name="get"> SELECT inventory FROM product WHERE productID = 1 </cfquery> <cfquery datasource="#request.dsn#"> UPDATE product SET inventory = #get.inventory# - 1, inventoryUpdate = now() WHERE productID = 1 </cfquery> </cflock>
What we've done is single-thread access to these two queries. Running through the example again, the first request would enter the CFLOCK, and the second would wait for the first to complete. The first would select 4, update to 3, and then exit the CFLOCK. Then the second request would enter the CFLOCK, select 3, update to 2, and exit. Problem solved.
There are better ways to solve this particular problem (relative updates and transactions spring to mind), but this entry is about CFLOCK. What's important is the general type of thing that's happening: a multi-step process concerning a resource that is shared across requests, where later steps depend on results of earlier steps, or the steps must happen all-or-nothing.
Where else might we find these kind of problems?
Well, all CF variables in shared scopes (server, application, session, client) are resources that are shared across requests. This includes instance variables within CFCs that are stored in one of these scopes. Corollary to this last item is the fact that local variables in CFC methods not declared with the var
keyword are instance variables. This can result in mysterious bugs that only ever crop up under load, so it's VERY important to use the var
keyword properly.
We also find it in database access, as demonstrated above, though it's usually better to use CFTRANSACTION to solve those problems, since database-level transactions are very likely going to be a lot more efficient.
Finally, we see it other external resources. The most common is files on the filesystem, though external objects (Java, COM) are another. Many objects are internally synchronized, so you needn't worry about locking, but not all. Make sure you check the documentation of your specific object. Notably, most of the Java Collection Classes are NOT synchronized, though there are static methods in the Collections
class to turn them into synchronized versions of themselves.
Well, we know we don't have to CFLOCK all access to shared scopes (that went the way of the dodo in CFMX), but when do we have to lock? We again look at the type of operation we need. Clearly reading and writing single variables doesn't qualify, but reading and writing multiple variables does.
So what does this really mean? If you ever write a variable in a shared scope, and any code that depends on it also depends on any other shared value, you must lock all access to the shared variable, both read and write. Ouch. That's a lot of locking, because every variable has to be written, or it wouldn't exist, so that means you have to lock everything except stand-alone variables.
But fret not, because CFLOCK isn't the only way to lock variable access. You can use some tricks to avoid having to use CFLOCK all over the place. The best one is for application variables that get initialized once, and never change. Since there is only one write event, we can break their lifecycle in two: the write phase, and the read phase. All we need to do is assure that no request will EVER get to the read phase before the write phase is complete, and that no request will EVER perform the write phase if it has already been performed. If we do that, then we never need to use CFLOCK on application variable reads. The question, of course, is how do we do that? Here's the way I prefer (in Application.cfm
, or the root settings file):
1. <cflock scope="application" type="readonly" timeout="10"> 2. <cfset isAppWritten = structKeyExists(application, "appWritten") /> 3. </cflock> 4. <cfif NOT isAppWritten> 5. <cflock scope="application" type="exclusive" timeout="10"> 6. <cfif NOT structKeyExists(application, "appWritten")> 7. <!--- set your app variables ---> 8. <cfset application.appWritten = true /> 9. </cfif> 10. </cflock> 11. </cfif>
Why does this work? First we test if we're through the write phase (lines 1-4). If we are, great, otherwise we have to attempt to perform it ourselves. Assuming it's not complete, we then get a lock on the initialization code (line 5). Once we get the lock (potentially waiting for other requests to release it), then we again check if the write phase is complete (line 6). We need the second check, because it's possible that while we were waiting for the lock, another request might have finished. If it's still not done, then we perform the write phase and exit the lock (lines 7-11).
There is a slight fudge going on for efficiency. The outer CFIF is unneeded, because the inner one will work by itself (though the reverse is NOT true). However, getting exclusive locks is expensive (and kills scalability), so we want to avoid it where possible, especially since this code will be executed by EVERY request. The outer CFIF is ensuring that no request will have to get the lock unless it comes in before the first request finishes the write phase, which basically translates to never.
"But what about CFC instance variables?", you're probably saying. "They're application variables too, and they're definitely going to get manipulated, or they'd just be normal application-scope variables." Time for another 'trick', though this one is far less sneaky: we don't have to lock application variables only with a scope="application"
CFLOCK.
Instead, inside our CFC, we'll lock all access to instance variables using a named lock. Then the non-CFC application code can still reference the application-scope instances without aquiring a lock, but we retain our ability to prevent race conditions. I perfer to use a UUID for my locking, which is set in the init()
method of the CFC into an instance variable. That UUID is then used to lock all instance variable access using a named CFLOCK in exactly the same way as we'd used scoped CFLOCK for "normal" variables.
<cffunction name="init"> <cfset variables.my.uuid = createUUID() /> <!--- set inventory variables ---> </cffunction> <cffunction name="getInventory"> <cfreturn variables.my.inventory /> </cffunction> <cffunction name="setInventory"> <cfargument name="inventory" /> <cflock name="#variables.my.uuid#" type="exclusive" timeout="10"> <cfset variables.my.inventory = inventory /> <cfset variables.my.inventoryUpdate = now() /> </cflock> </cffunction>
There are two caveats:
- CFC don't have real constructors, meaning that it's possible to call the
init()
method multiple times (bad) or call other methods before callinginit()
(even worse). What does this mean? You need to take a couple precautions. First, all methods should fail ifinit()
hasn't been called. Most CFCs are like this anyway, because they depend on initialization parameters (like a DSN). Second, calls to theinit()
method must be externally locked. Fortunately, since we're creating and initializing all our application-scope CFCs within the locking framework discussed above, that's already taken care of as well. Just be careful of non-application-scope CFCs (like session-scope). - This type of locking only keeps the CFC's internals in sync. It is still suceptible to the exact same problem we ran into with the first example using two queries (coincidentally, performing the exact same operation). So if in our application code (outside the CFC) we call
getInventory()
and follow with asetInventory()
that uses the value, we still have to lock it on our end, just like the first example.
For external resources, locking is a bit trickier. Files on the local file system are easy, always use a named lock on the canonical absolute pathname. Files on a remote filesystem shared between servers are problematic, because there's no way to use CFLOCK across multiple servers. You'd have to use some kind of semaphore file, and then lock access to that, and it turns into a mess very quickly. External objects can usually be locked using their class name (like files), if they're local. Remote shared objects should have built-in synchronization.
It was pointed out after I wrote this that locking is not required if you don't care if there are slight errors in application logic that could result from race conditions. Memory corruption will never result from a lack of locking in CFMX (unlike previous versions), only logic errors.
thanks for your post, i realized a mistake of mine about synchronized objects… I am using a linkedhashmap in a shared cfc field. I needed to use the Collections static method to return a thread safe map.
Synchronized objects don't avoid race conditions if a condition can exist across multiple operations (i.e. a contains() and then a subsequent add()), it only protects against single-operation race conditions. So basically, getting a thread-safe object isn't going to help you solve the same problems that CF locks will solve. And more importantly, if you use CF locks properly, there will be no need for a synchronized object, because CF will be single-threading all access to the object. For non-single-threaded objects (i.e. ones you access outside CFLOCK blocks) that are accessible to multiple requests, you should always use a synchronized version as you describe. That's the reason that CF uses Vector over ArrayList for CF arrays, and Hashtable over HashMap for CF structs.
I only read the beginning of your article, can't continue because I am shaking my head so much. Your first example is odd to say the least. The normal CF way of ensuring that the queries run as one unit would be to wrap them with a cftransaction. There are far better examples of the use of cflock… why not start with the common example of locking write to Application or session variables??
This statement is misleading as well: "a multi-step process concerning a resource that is shared across requests, where later steps depend on results of earlier steps, or the steps must happen all-or-nothing."
Um, no, not really. I need to lock my writes to shared scope variables not because there are multiple *steps* involved, but because there could be multiple *threads*.
You are really going to confuse newbies with this post.
Cynthia,
I agree (to some extent) with your first statement that the CFQUERY-based example is "odd". I do explicitly list both transactions and relative updates as solutions (the latter being preferable) to the problem. However, the reason for using queries is that the order of events is a lot more clear cut than . That statement is both a read and a write in a single expression – hardly clear that there are two distinct and fundamentally different operations happening.
Regarding your second point, you're wrong. A race condition can ONLY exist where there are multiple steps. If the "process" is an atomic single action, you CAN'T have a race condition. You can easily replace "requests" in my statement with "threads" if you prefer that language, but the "multi-step" requirement is still in force. This is the reason that a relative update on a database is safe to perform without a lock or a transaction: it's a single atomic operation so a race condition is impossible.
>Regarding your second point, you're wrong. A race condition can ONLY
>exist where there are multiple steps.
You're missing the point. People conceive of writing to an Application or session variable as one "step". Their confusion will be reinforced by the first example you used, where clearly there are two steps. Do query one, then do query two.
I remember having a hard time understanding why I needed to lock when I was first starting out. I corresponded with Ben Forta and he cleared everything up, and I can tell you that the way you are explaining things is most definitely going to leave newbies (and probably some others as well, sorry to say) with the wrong ideas.
Cynthia,
I would be more than happy to post an alternate walkthough you provide, and/or link to one on your site if that would allay your fears of newbie confusion.
The first example of course is ehh, but it does demo two processes that take time to execute, and could potential be cross threaded. The 3rd example is where it's at for me.
I think I've come to find you can't store a cfc in the application scope if it needs to be thread safe. I did find I could store a cfc that has not been init(), and then init it to the request scope. All you would gain here is that the cfc is loaded and waiting for use in a non-shared scope.
I'm having the issue that this particular cfc at work requires a dsn, dbo, and was stored in the application scope. Problem was that if another request with a different dsn/dbo came threw, and the application name was the same, they'd cross thread.
All in all I wanted to post somewhere that I don't think you can safely store a cfc that counts on it's variables remaining the same for a full request.
TIP!!!! ColdFusion 8 has cfThread with attribute action="sleep", this made it very easy for me to loop from="1″ to="100″ and sleep for 200milsecs on every loop. So it made it much much much easier for me to run other machines/browsers to attempt a cross thread.
Damn Descent post BarneyB. I've got a "Cynthia" type at my job too, mine is an impatient woman who wouldn't finish reading an article either, and quick to bash an article with comments such as "keep my hands from shaking".
Acker,
If you have a CFC in application scope, then it's instance variables are in application scope as well. So it's no different than storing what is in the instance variables directly in the application scope. If you have per-request data, that needs to be passed into methods of application-scope CFC, rather than being part of the CFC's internal state. I don't know about your specific case, but if you just made the dsn/dbo arguments that are passed into the CFC, the problem should go away.
BarneyB, your correct. At my job they are trying to keep the CFC in the application scope, but then they mail-blasted that they needed to move the CFC into the request scope to make it thread safe.
I was so focused on the thread safe part, that I didn't consider the simplistic fact that it's in a shared scope.
To fix the problem I came up with the idea of separate application names, per dsn name.
B, could you please elaborate what you meant by "but if you just made the dsn/dbo arguments that are passed into the CFC, the problem should go away" … I think somethings missing in your statement.
Acker,
Here's an example:
[cffunction name="getSomething"]
[cfargument name="dsn" /]
[cfargument name="dbo" /]
[cfquery datasource="#dsn#"]
select *
from #dbo#.mytable
[/cfquery]
[/cffunction]
Since the dsn and dbo are passed into each individual method invocation, you alleviate the threading issues. Method arguments and function-local variables (declared with 'var') are always single-threaded, because a method invocation is single threaded. It's only the instance state of a shared scope CFC that is shared.
cheers,
barneyb
I gotcha … Yeah we would have to find/replace all calls to the CFC methods to always supply dsn/dbo … for now the CFC lies in the request scope, and is re-initiated with every request =(
I say my company needs to make an application name per project.
-Acker
Acker,
Application-per-project would definitely work, and is probably the least invasive solution. Another potential would be to use the request-scope CFC as a facade for an application-scope CFC. So your change the CFC to accept dsn/dbo parameters and put it in the application scope, but you retain the request-scoped CFC which simply delegates to the application-scope CFC. So assuming the implementation I gave above was how the application-scope CFC was implemented, the request-scope CFC (which you're already using, and which wouldn't have it's API change), would look like this:
[cffunction name="getSomething"]
[cfreturn application.getSomething(request.dsn, request.dbo) />
[/cffunction]
You'd have a single instance of the facade CFC that gets copied into the request scope for each request, and a single instance of the "real" CFC that stays in the application scope. Then all you need to do is ensure request.dsn and request.dbo are defined and you're done.
B,
I thought and suggested that same idea. Mine was more like: Lets keep the CFC in the application scope for loading speed, but INIT() it to the request scope at each request. Basically saying the same thing your saying, but your method is more logical (although your suggesting to use a shared variable within a cfc, which is advised against, but doable).
Cool Cool BarneyB, best single threaded blog about cross threading in shared scopes I've ever seen.
-Acker