The cost of encapsulation
I’m debugging performance issues with a C++ server that has been stalling and then failing to recover. I’ve reached a point where we can generate the problem using a network interruption that causes multiple connections to disconnect at the same time. The fixed sized thread pool that services these connections becomes overloaded with work that requires it to clean up connection objects for the disconnected connections and all of the threads in the pool spend far too long fighting over the lock to the heap as they try and return memory to it to clean up the connection objects.
In this particular test there are 30,0000 objects to clean up and 8 threads to do the work but it seemed strange that it was taking so long. The task is clearly one that does not scale well with additional threads. As part of the investigation work I separated out the code that did the clean up and moved it to a separate thread pool so that I could measure the work more accurately by keeping it away from any other work that the original pool needed to do. The new thread pool takes around 30 seconds to clean up the objects using 8 threads. Unfortunately, it also takes around 30 seconds to do it with a single thread…
The connection objects are quite complex, they run a custom reliable UDP protocol, and are built in an object-oriented style in C++. Each connection consists of many other objects and each object is encapsulated so that it’s not immediately apparent that to destroy one connection requires around 200 trips into the heap. I’m sure we can do better than that; I’m just surprised at how well the issue was hidden by the encapsulation of the various objects that make up the connection.
Of course this “cost of encapsulation” is also one of the benefits of encapsulation. The complexity that goes in to the object that manages a connection is cleanly compartmentalised into the sub-objects that are used to construct it. I don’t need to understand all of the complexity to work on one part of the connection and each part can be unit tested in isolation. Unfortunately, this isolation means that each piece is independent and that independence means that the memory for each connection is currently given back to the heap in 200 pieces…
The reason that the server wasn’t recovering was the fact that the stalled thread pool allowed the work queue feeding it to grow very large, this meant that new connections were timing out whilst in the queue waiting to be processed and this led to more disconnected connections that needed to be cleaned up. Rinse, repeat… The recovery issue has now been dealt with so now all I need to do is redesign this connection object to be a little more efficient in its memory management.
In the next post I lift the curtain on the encapsulation provided by the design of the objects involved here and peek at their use of the memory allocator using a simple hack.