Easy performance improvement for the socket server code

I was discussing the performance of The Server Framework with someone the other day and realised that there was a reasonably easy way to speed up some types of servers.

I coded a first cut of the solution last night and, so far, it shows a small, but noticeable performance gain for no changes to functionality or code outside of the framework.

The IOCP server framework was always designed for ease of use over performance. The performance that we’ve got from the framework has always been well within our client’s requirements so we’ve never bothered to push things further.

One ‘interesting’ thing about async IO requests is that they’re automatically cancelled for you if the thread that issued them exits before they complete. This caused us problems on our first servers as they used a dynamic pool of threads for database related work and yet issued writes to their sockets from these threads. The thread pool was dynamic, so it could shrink, and threads could die. We decided that it would be a poor design to track the number of pending IO operations on a particular thread and prevent a thread from terminating until all outstanding IO requests had all completed because it would make things complex for the users of our framework; they’d probably have to use a ‘framework aware’ thread …

Anyway, the end result was a design where all async socket operations were ‘marshaled’ across to the IO thread pool. When a Write was issued on a socket a ‘write request’ was posted to the IOCP, then, one of the IOCP worker threads would pick up the write request and issue the actual async write. This meant that all async requests were started by threads which ran for a known lifetime. If you shut down the IO thread pool at an inappropriate time then you shouldn’t be surprised if some IO requests were unexpectedly terminated.

The potential performance increase is for operations that are issued from an IO thread anyway. These operations were still going through the two-stage; ‘post a request’ and then ‘process the request’ algorithm even though they didn’t really need to.

The solution to this problem is fairly easy. If the thread that issues the operation is an IO pool thread then you don’t use the two stage process, you just perform the operation. The trick is knowing that the code is currently running on an IO thread. The current implementation uses thread local storage to work this out. The IO thread pool allocates a TLS slot and the threads that it creates each insert an identifier into that slot before they start processing events. When an async operation is about to occur the framework gets the IO pool to check to see if this thread is one of its own threads and if it is it does the work directly and if it isn’t it issues the two stage request.

This seems to work nicely though the current implementation needs some cleaning up…