My theorising about the strange memory related failures that I was experiencing with my distributed testing using WinRS have led me to putting together a test runner that can limit the amount of memory available to a process and terminate it if it exceeds the expected amount. With this in place during my server test runs I can spot the kind of memory leak that slipped through the cracks of my testing and made it into release 6.2 of The Server Framework.
The idea is that rather than just starting the server we start the server using a monitoring process that first creates a job object for the server process and then starts the server process and assigns it to the job. The monitoring process can set some limits to the amount of memory that the processes in the job can allocate and it gets informed when they reach that limit. For the kind of testing I’m currently doing the monitor then simply terminates the server which causes the test to fail. You need to run the target process under the monitor without the memory limiter in place a few times and dump out the memory allocation stats to get an idea of the maximum memory that it will allocate during a normal run and from then on you can limit the memory allocation to that amount and be sure that the tests will fail if there’s a leak like the one that made its way into 6.2.
Whilst I was adjusting the test scripts to test the new test runner I decided that it might be a good idea to merge this functionality into my “lock inversion detector” code. The lock inversion detector is a cut down version of my deadlock detection and “lock explorer” tools. The lock inversion detector is considerably faster than the full lock explorer and the idea is that it is used in a similar way to the memory monitoring test runner. Test servers run under the lock inversion detector and if a release introduces the potential to deadlock via a lock inversion then the test will fail. Note that the code under test doesn’t actually have to deadlock it just has to have the potential to… This is quite a powerful tool and it’s helped my clients out on many occasions but it’s still under development (I tend to use it, tweak it if need be to find the problem, and then move on with fixing the problem). Recent tweaks have made it run fast enough to become part of my build process.
Note: the lock inversion detector mentioned in this blog post is now available for download from www.lockexplorer.com.
If the lock inversion detector finds a lock inversion you need to run the full deadlock detector to get the information you need to fix the problem. This causes the target process to run a little slower than when under the lock inversion detector but the end result is a list of problem lock sequences with the threads concerned and call stacks showing each lock manipulation. With this information it’s usually pretty trivial to find and remove the lock inversion. Again the target process doesn’t need to actually deadlock for the deadlock detector to show you where it could deadlock.
I need to do some more work to integrate these tests into my build and release process but they’re valuable additions. Ideally I’d like to merging the memory monitoring test runner functionality with the lock inversion detector, but for now I may simply run one set of tests with the memory monitoring and one with the lock inversion testing.
These improvements to the build and release process will hopefully be in place for the release of 6.4.