Two quite different approaches to multi-threading

I’ve been working on some code for a client recently that needs to run in a multi-threaded environment. Unfortunately it was never really written with that requirement in mind, or, more likely, the people who wrote the code knew that it needed to be accessed from multiple threads but didn’t really understand quite what that meant. As such I’m doing some fairly hairy refactoring for them. It’s high risk as there are no unit tests and the budget doesn’t really extend to completing the work, let alone “spending extra time” writing unit tests… The code is written in quite a ‘C’ style, most things are simple structs and most data is public and as such member functions fall where they are most conveniently written. Adding correct locking is complicated due to lots of mutable global data, lack of encapsulation and the fact that, at its heart, it’s multi-user server software where one user gets to poke at all the others.

Step zero was to realise that no locking is bad but correct locking was a long way away so I slipped in a single “global lock” to make the system work. It’s stable and not too slow but it doesn’t really ever use more than one thread at a time as almost everything requires that the executing thread holds the “global lock”.

Step one is to add some encapsulation and move things around so that we can one day dream of adding in more fine grained locking - unfortunately this step is a big one.

I’m finding it interesting comparing this work to the work I was doing immediately before. I’ve been working on the next big release of The Server Framework and this release will put an Activatable Object at the heart of each connection. This removes the need for locking from the socket object and lots of stuff that’s built on top of it. It means that masses of complex code can be pulled out and thrown away. It’s not a new idea, the socket object is now simply driven by a message queue. Locks are needed to put work items into the queue but during processing we can run without locking, safe in the knowledge that only one thread can ever be processing at a time.

The problem is that making the kind of change that I’m making right now is a massive change. It has led to a complete rewrite of the entire TCP side of The Server Framework. The general pattern is removing locking, because it’s no longer needed and simplifying code because the ways in which it can be called are simpler. The unit tests that I had for the previous version of the code have helped to guide me; yes new tests have been written, but the old ones told me what I needed to test even if they needed to be changed a lot to work with the new code.

With The Server Framework work it was scary but the various layers of tests supported the work and, in general, the code was being made simpler and easier to understand. With the client work there are no tests, huge amounts of the application simply didn’t compile until the changes were complete and, whilst some things are being simplified, others are being made more complex to accommodate changes to make the code thread safe. I expect that with enough iterations the client’s code could shift to using Activatable objects and some of the locking would go away again but I’m not sure we’ll have the luxury of those extra iterations.