Measure twice, cut once
I’m currently working on a small auction server for a client using The Server Framework. It’s a relatively simple server, messages come in, are validated and are then broadcast to interested parties and logged. Yesterday we shipped the first cut of the source to them and I was a bit concerned that our simple test app could thrash the server so easily. Admittedly the first cut of the code hadn’t been tuned at all and the logging implementation was fairly Mickey Mouse but I was a little concerned. I woke this morning with a list of perf improvements to try. Some of them from my original design sketches and some new ones. Luckily I put the right one at the top of the list.
The list consisted of the following:
Perfmon counters for bids in and out per second.
Switch logging to use async file writes.
Reuse input buffer if possible during broadcast to avoid memory allocation and copying.
Move the broadcasting and logging to a separate thread pool so that it can be tuned independently to the IO threads.
I added the new perfmon counters and ran our simple clients up again. The clients are in C# and do async socket stuff to talk to the server. The client sends a message and then reads messages and sends them back to the server. If you run one client then nothing happens, it sends a message to the server, the server only has one connected client so it broadcasts it to a set of 0 clients. If you run 2 clients then they ping pong messages back and forth between them. As you run more clients it gets more interesting as the server broadcasts and client echoes cause a massive increase in server load as each client echoes each message to every other client (via the server’s broadcast facility). Simple stuff and only thrown together as a proof of concept and to help the client with async socket stuff in C#. Anyway, run 3 or 4 clients on my dev box and the server starts to lag behind completing its async sends, this causes it to chew up non paged pool at an interesting rate and, well, eventually things start to fail.
Bear in mind that these aren’t 3 or 4 normal auction users. They’re clients that respond to every bid with a counter bid and they do it as soon as they get notified of a bid. So, 1 sends to 2-4 and then 2-4 sends to 1-4, which causes 1-4 to send to 1-4, etc. It gets busy, quickly.
The new counters show that the server does OK until outbound bids reach around 5000/second, at which point it needs to throttle itself. I wondered how it would perform with the logging element removed and then realised that the Mickey Mouse logging wasn’t operational in the release build anyway - we were logging to our debug stream and, well, it doesn’t exist in release mode… Switching to an equally Mickey Mouse logging scheme that uses a debug stream that IS available in release builds gives us a server that handles around 3500/bids out second and throttles itself (due to the sync logging calls) and doesn’t exhibit the rampant exhaustion of non paged pool - which is good ;)
All of this, at present at least, takes off the pressure to improve the performance; we’re well within the client’s requirements. Of course we need to create some more realistic performance tests and we do need to make at least some of the other changes on the list, but now that we have the new perf counters at least we can see the performance and we don’t need to guess what the server performance actually is.