One of my long-term clients has hundreds of cloud machines running instances of their server, each server maintains thousands of reliable UDP connections using a custom protocol that we’ve developed over the years. When things go wrong it’s often hard to work out why. Even though we have reasonable unit test coverage of the code that runs the UDP protocol, it’s hard to build tests that cover every possible scenario.
I’ve built a small Windows Service which exposes perfmon counters to track sockets in TIME_WAIT state. It can be downloaded from the links later in this post. Back in 2011 I was helping a client look for issues in their systems caused by having too many sockets in a TIME_WAIT state (see here for why this can be a problem). This was affecting their connectivity. Rather surprisingly there seemed to be no way to track the number of sockets in TIME_WAIT using perfmon as there didn’t seem to be a counter exposed.
I’ve just stumbled on these blog posts, by Maciej Sinilo, a game developer. He’s written a memory allocation monitoring tool and mentions that using RtlCaptureStackBackTrace() is a faster (if undocumented) way to capture a call stack. This is interesting to me as the call stack capture code in my debugging tools (deadlock detection, timeshifter, tickshifter, etc.) is pretty slow when using StackWalk64(). It’s also interesting that he seems to store and sort stacks by CRC which is similar to what I do in my tools.
Last week I mentioned that some of my tests for my Win32 Debug API class had suddenly started failing. It seems that I was right and the changes are due to some .Net fixes that have been rolled out recently. The code runs and the tests pass if I run on a clean instal Vista x64 VM and fail on my day to day development box. It seems that my plan to “stick a breakpoint in mscoree.
Back in October 2007 I sumarised my findings from getting my Win32 DebugAPI based debug engine working on x64. One of the strange things that I found at the time was this: When running a CLR app under the Win32 debug interface you only ever seem to hit the native entry point if you’re running under WOW64. In all other situations you don’t hit the native entry point ever. If you rely on it to pause your debug tools once the process is completely loaded and ready to roll then you need to stick a break point in _CorExeMain in mscoree.
I’ve finished porting my debugging tools support libraries to x64 now and thought it was worth putting up a summary of the issues that I’ve noticed: A 32bit exe can’t start a 64bit exe for debugging - pretty obvious really. When a 64bit debugger is running a 32bit debugee the debugger seems to get TWO “loader breakpoints” one when the 64bit dlls are loaded and a second when the 32bit dlls have been loaded.
It seems that I’ve located the “issues” in my Debug Tools library. This library is used in my TickShifter (time control) tool and my native Win32 Deadlock Detection tool. Due to how I wanted to control the debugged processes start up and how I needed to halt the debugged process at a particular time there were some hoops that I found I had to jump through. Those hoops have changed shape; either because of differences between Vista and XP or due to x64 and the WOW64 layer, I’m not sure which yet.
I should be finishing some docs for the x64 release of The Server Framework… But this is more interesting… When running my Win32 debugging code on x64, this time when compiled natively as x64 code and when debugging an x64 CLR process, I’ve been getting an ‘unexpected’ ExceptionCode in an EXCEPTION_DEBUG_EVENT. The code is 0x4000001f and, after some searching around, it seems that it’s a STATUS_WX86_BREAKPOINT event and I sometimes get these instead of EXCEPTION_BREAKPOINT events…
I spent a little time looking at an x64 port of my debugging tools library at the weekend. Since this requires me to set breakpoints and manipulate process memory and image files and all sorts I expected it to be a little more complex to port than the higher level sockets code and the bulk of my Win32 code. So far things are going reasonably well, but I’ve just come across a strangeness with DebugSetProcessKillOnExit(TRUE) in an x86 debugger that’s running an x86 process on an x64 machine.
Back in April 2006 I posted a copy of TickShifter, see here for details. It seems that there was a bug in my Win32 debugger code on which TickShifter is built. The bug was that we failed to “forget about” dlls that were unloaded… Because we failed to forget about them it was possible for the debugger code to try and do something with addresses in these dlls that were no longer loaded and this would cause a C++ exception on the debugger thread when our call to ReadProcessMemory() failed and this caused all sorts of problems…