Practical Testing: 15 - Testing payback

Last year I wrote a series of articles called “Practical Testing” where I took a piece of complicated multi-threaded code and wrote tests for it. I then rebuild the code from scratch in a test driven development style to show how writing your tests before your code changes how you design your code.

The code under test was intended to be very scalable, I use the code to provide light weight asynchronous timers for my socket server framework and the idea is that each connection might have several timers associated with it. In the original introduction I wrote this: “The timer is designed to be used in vast quantities. Let’s have a timer, or two, for each connection and let’s have 10,000 connections… Because of this we wanted the timer to be something that we could create lots of without caring.” and yet when I came to use the code to provide a timer to drive the sending of data within a server connection stress test tool the timer became very inefficient when I had 5000 connections running.

The code needed to be changed and the tests made making the changes a safe and controlled process.

Although the intention was to build a scalable timer queue the code had never been tested “in anger”. The server project that it had been used for had been shelved and none of the other servers where it was used had quite the same scalability requirements. There were no tests to measure the scalability and since it worked well enough in the servers where it was required we didn’t worry about it too much.

The last few weeks have seen several potential customers come to us and ask for scalability numbers for our framework. During the testing that I’ve been doing to obtain these figures I put together a load testing tool that creates many connections to a server and sends data over the connections to see how the server performs under varying loads. The first version of this tool simply sent a new message every time it recieved a response from the previous message, this was fine for thrashing the server but less good for more real world scenarios. I decided to change the server test so that it could be configured to send data every so often and used the timer queue, that this series of articles developed, to drive the sending process. Given the test tool is designed to create many thousands of connections it caused the limitations of the timer queue design to show up pretty quickly.

The usage pattern for the test tool is this. We create a connection and create a timer for it. We send some data and set the timer for the delay we want to introduce between each message. When the timer goes off we send more data and reset the timer. When testing the test tool we ran it on the same machine as the server that it was testing. When the number of connections increased the machine’s CPU usage went up more than we were expecting. The load on the machine wasn’t the server but the test harness. One thread in the test harness was thrashing one of the CPUs; looks like a busy loop. Breaking into the test tool showed us exactly what the problem was. InsertTimer(), the function called each time a timer is set included a linear scan of the timer queue starting at the head of the queue, the next timer to go off. The scan ran along the queue until it found the place where the new timer should be inserted. With 5000 timers all resetting themselves to be pretty much the ’latest’ new timer we were scanning a list of 5000 elements each time we set a timer. Ooops. A fairly obvious mistake when you consider that the algorithm used is O(n). Still, up until now the performance requirements have been a bit “hand wavey” so I guess I can be forgiven…

The obvious solution is to switch from the O(n) linear search to an O(log(n)) binary search; change the list to a map keyed on the timeout. This is a reasonably straight forward change but given the work we put in to make sure that we dealt with the original bug due to the use of GetTickCount() the solution is slightly more complex than you might first think.

Our list based timer queue relied on partitioning the list into two sections, a section for timeouts before GetTickCount() wrapped to 0 and a section for timeouts after this point. To switch to using a map to manage the timers we actually need to keep two maps of timeouts and, at the point where we would have previously moved from one side of the “wrap point” to the other we can simply swap the maps over. Since there may be more than one timer set for each absolute timeout value we need to use a multimap rather than just a map. The handle map becomes a little more complicated as we have to store the iterator and a pointer to the map into which the iterator points. Since we no longer need to compare timeouts whilst traversing the list we can remove the timeout and corresponding functions from the TimerData class. Quite a lot of changes to the heart of this rather complicated class. But at least we have tests to help us.

Once I realised what the problem was and came up with the solution I just sat down and hacked the old code out and put the new code in. Normally I would be quite nervous making changes like this to a piece of code but I knew that I had a good set of tests to back me up so I just got on with it. The external functionality shouldn’t change at all so all of the existing tests should pass if the code still functioned and the performance characteristics should improve. When I finished making changes and the code compiled I ran the tests and a couple failed to due obvious oversights. I fixed those and the tests passed and running the server test tool showed the expected, dramatic, performance improvements. The whole thing took under an hour.

In the past I would have thought twice about making the changes and I’d have been concerned that I’d broken things, especially with a piece of code that is inherently more complex than it appears. Having tests to support me made the whole experience easy and stress free.

Code is here and the new rules apply.