Rambling Comments - Len Holgate's blog

As the recent spate of bug fix and patch releases shows I’m not scared of talking about the bugs that I find in the code of The Server Framework and pushing fixes out quickly. It’s my belief that the most important thing to get out of a bug report is an improved process which will help prevent similar bugs from occurring in future and the only way to achieve that is to be open about the bugs you find and equally open about how you then address them and try and prevent similar issues.

There’s a stupidly obvious, should have been caught at numerous points during pre-release testing, bug in CServiceManager.cpp. The code below that starts at line 158: case RunActionRunAsService : { if (!StartServices()) { const DWORD lastError = ::GetLastError(); const _tstring message = messageHeader + _T("Failed to start service.\n\n ") + GetLastErrorMessage(lastError); MessageBox(message); result = 2; } } default : throw CException(_T("CServiceInstanceManager::Run()"), _T("Unexpected run action:") + ToString(runAction)); } Should actually look like this, note the inclusion of the missing break; and the exception source correction:

We have a new client profile available here for a client that we’ve had since 2006 in the desktop sharing market. Their system, built on The Server Framework, runs on more than 120 servers worldwide and handles more than 200,000 desktop sharing sessions each day!

After a week or so of serious dog fooding I’ve finally got a nice reliable lock inversion detector running as part of my build system for The Server Framework’s example servers. Note: the deadlock detector mentioned in this blog post is now available for download from www.lockexplorer.com. The build system has always run a set of ‘black box’ server tests for each server example as part of the build. These start up the example server in question, run a set of connections against it and shut the server down.

Version 6.3.2 of The Server Framework was released today. This release is purely a bug fix release and includes the following fixes. Fixes to JetByteTools::OpenSSL::CAsyncConnection to remove the possibility of deadlock due to a lock inversion. Fixes to JetByteTools::SSPI::SChannel::CAsyncConnection to remove the possibility of deadlock due to a lock inversion. Fixes to JetByteTools::SSPI::Negotiate::CAsyncConnection to remove the possibility of deadlock due to a lock inversion. Fixes to JetByteTools::CLRHosting::CCLREventSink and JetByteTools::CLRHosting::CCLRHost to remove a race condition during host shutdown which could have caused a purecall due to events being fired after the event sink has been destroyed.

My tangential testing that began with my problems with commands run via WinRs during some distributed load testing are slowly unravelling back to the start. I now have a better build and test system for the server examples that ship as part of The Server Framework. I have a test runner that runs the examples with memory limits to help spot memory leak bugs and a test runner that checks for lock inversions.

I found this article recently whilst discussing a question about socket reuse using DisconnectEx() over on StackOverflow. It’s a useful collection of the various configuration settings that can affect the number of concurrent TCP connections that a server can support, complete with links to the KB articles that discuss the settings in more detail. It’s a bit out of date, but it’s probably a good starting point if you want to understand the limits involved.

As I mentioned, I’ve been adjusting my build system and have finally got to the point where my lock inversion detector is suitable to run on all of my example servers during their test phase on the build machines. I’m working my way through the various example server’s test scripts and adjusting them so that they use the lock inversion detector, can be easily configured to run the full blown deadlock detector and also can run the servers under the memory profiling test runner that I put together earlier in the week.

I’ve been improving my pre-release testing system and now run a lock inversion detector as part of my build machine’s build and test cycle for the socket server examples. This lock inversion detector can detect the potential to deadlock without the code ever needing to actually deadlock, so it’s a pretty powerful tool. It has detected a lock inversion in the async connectors used by the OpenSSL, SChannel and SSPI Negotiate libraries.

My theorising about the strange memory related failures that I was experiencing with my distributed testing using WinRS have led me to putting together a test runner that can limit the amount of memory available to a process and terminate it if it exceeds the expected amount. With this in place during my server test runs I can spot the kind of memory leak that slipped through the cracks of my testing and made it into release 6.

Asynchronous Events: My approach to bugs

Asynchronous Events: Bug fix for 6.3.x CServiceManager.cpp

Asynchronous Events: New client profile: Desktop Sharing Company

Lock inversion detector finally fully integrated in my build

Asynchronous Events: Latest release of The Server Framework: 6.3.2

WinRM/WinRS job memory limits.

Asynchronous Events: Useful link to TCP connection knowledge base articles

A lock inversion detector as part of the build is good

Asynchronous Events: Potential to deadlock in OpenSSL, SChannel and SSPI Negotiate connectors

Tangential testing