Note: the deadlock detector mentioned in this blog post is now available for download from www.lockexplorer.com.
The build system has always run a set of ‘black box’ server tests for each server example as part of the build. These start up the example server in question, run a set of connections against it and shut the server down. This proves that the example servers actually work and has also proved a very useful technique for flushing out hard to reproduce bugs; generally race conditions. Since the build system runs so often and on so many different build machines the code gets run enough and on varied enough hardware to flush out these problems.
I’ve been aiming to get some form of deadlock testing into this build system for a while. The original deadlock detector tool was too slow to run on the build servers and so I developed a cut down version and tuned the injection and monitoring engine. That gave me a detector that COULD run as part of the build system…
This last piece of work has involved integrating that detector into the test scripts so that it gets run and can report failures in a way that will flag the build as failed if it detects any lock inversions. Of course it also detects full blown deadlocks but the lock inversion detection is much more powerful as the code under test need never actually be coaxed into deadlocking it just has to show the possibility of deadlocking for the tests to fail.
I’ve now run the new build system for the 70+ example servers and I’ve found one lock inversion in three places of the framework; the inversion was due to a pattern that I repeated for the OpenSSL, SChannel and SSPI connectors and was easily fixed by removing the lock that was the cause of the inversion and instead using one lock rather than two. This clearly shows what I’ve known for a long time, object oriented programming and encapsulation in particular are often at odds with concurrency. It’s obvious to make an object thread safe by giving it a lock that it can use to protect itself; it’s often then more complex to ensure that these objects don’t deadlock as they call into each other whilst holding locks that aren’t evident at the call site. At present I consider it a post design optimisation to break encapsulation and, potentially, share locks across related objects. The lock inversion detector as part of the build makes it obvious when it’s not just an optimisation but a requirement.
The lock inversions mentioned above are removed in release 6.3.2 of The Server Framework which was released today.