Back in September I mentioned that I had found a problem with my usage of Slim reader/writer locks. I expected this to be something that I was doing wrong but it turned out that it was a kernel bug.
This morning Johan Torp tweeted that a hotfix for this issue is now available.
The note on the hotfix states: “Note These issues are not obvious and are difficult to debug. This update is intended for applicable systems to apply proactively in order to increase the reliability of these systems.
It looks like the Slim Reader/Writer issue that I wrote about back in September is a kernel bug after all.
Stefan Boberg has just tweeted to let me know that Microsoft has confirmed it’s a bug and that a hot fix is in testing.
I’ve been noticing a strange thing for a while on Windows 8/8.1 and the equivalent server versions. The issue occurs when I’m using a Slim Reader/Writer Lock (SRWL) exclusively in exclusive mode (as a replacement for critical sections). What happens is, when a thread that has just unlocked a SRWL exits cleanly, immediately after unlocking the lock, sometimes threads that are waiting on the lock do not get woken and none of them acquire the lock.
I’ve just released new versions of my Lock Explorer tools, LID and LIA. This is quite a big release as it increases the number of locking APIs that the tools instrument from 1 to 3. We now track Slim Reader Writer locks and Mutexes.
Arguably the tools should always have tracked these, and possibly more API calls, but the tools have always been first and foremost to assist in the development and testing of The Server Framework and, well, we only use Critical Sections.
I’m currently building a new example server for The Server Framework. This is a variation on one of our proxy server examples for a client that’s doing some WebSockets work. The idea is that the server takes an inbound WebSockets connection, creates an outbound TCP connection to the target server and routes data to and from the remote server and the WebSockets client. It’s fairly simple stuff to put together once you’re up to speed on The Server Framework but my client needed a helping hand and it’s another nice example of what you can do with the framework.
After a week or so of serious dog fooding I’ve finally got a nice reliable lock inversion detector running as part of my build system for The Server Framework’s example servers.
Note: the deadlock detector mentioned in this blog post is now available for download from www.lockexplorer.com.
The build system has always run a set of ‘black box’ server tests for each server example as part of the build. These start up the example server in question, run a set of connections against it and shut the server down.
As I mentioned, I’ve been adjusting my build system and have finally got to the point where my lock inversion detector is suitable to run on all of my example servers during their test phase on the build machines. I’m working my way through the various example server’s test scripts and adjusting them so that they use the lock inversion detector, can be easily configured to run the full blown deadlock detector and also can run the servers under the memory profiling test runner that I put together earlier in the week.
My theorising about the strange memory related failures that I was experiencing with my distributed testing using WinRS have led me to putting together a test runner that can limit the amount of memory available to a process and terminate it if it exceeds the expected amount. With this in place during my server test runs I can spot the kind of memory leak that slipped through the cracks of my testing and made it into release 6.
I’ve just stumbled on these blog posts, by Maciej Sinilo, a game developer. He’s written a memory allocation monitoring tool and mentions that using RtlCaptureStackBackTrace() is a faster (if undocumented) way to capture a call stack. This is interesting to me as the call stack capture code in my debugging tools (deadlock detection, timeshifter, tickshifter, etc.) is pretty slow when using StackWalk64(). It’s also interesting that he seems to store and sort stacks by CRC which is similar to what I do in my tools.
The free version of the socket server framework contained code that could cause a deadlock during connection closure if you also have a lock in your derived class.
There’s a lock taken out in CSocketServer::Socket::IsValid() that isn’t really required and which can cause a deadlock if you have your own lock in your derived class which you lock in OnConnectionReset() or other server callbacks and which is also locked when you call into the framework via Write() or other calls.