Socket Server code: Connection termination race condition bug
I’ve just fixed a problem in The Server Framework that was reported to me by one of my clients. There’s a race condition during connection establishment which can be demonstrated by a client that connects and then terminates the connection very quickly. The bug could be said to be in the example servers rather than the framework itself but it can be fixed by a change to the framework…
In the non
AcceptEx() version of the TCP server code, during connection establishment
OnConnectionEstablished() is called from the code that processes a successful
Accept(). In most of the example servers this virtual function is used to send a message to the client and then issue the first read request. If the client disconnects and the disconnect is processed before the code in
OnConnectionEstablished() completes then we may be issuing a read or write on a closed socket. This results in an exception and the calling code does nothing to protect itself from exceptions in
OnConnectionEstablished() and so the
Accept() loop is terminated and the server fails to allow any further connection attempts.
The example servers should probably be changed to use
TryRead() instead but the framework should also be changed to protect itself from exceptions generated in client code.
Right now, a successful quick fix is to wrap the call to
OnConnectionEstablished() in a try/catch block, you can log the exceptions if you want to, but I’d personally ignore them…
Users of the framework should be aware that you can get connection notification callbacks in any order, that is you may get a disconnect on a socket before you get a connect. It’s unlikely but possible…