There’s an interesting article over on the Dr. Dobbs Code Talk blog; PQR - A Simple Design Pattern for Multicore Enterprise Applications. It documents a design that I’m pretty familiar with and one which has worked pretty well for me in the past (this project was built in this way, for example).
My variation on this idea is that it all tends to be in one process. Work items are passed from one ‘processor’ to another via queues and each processor can run multiple threads to process multiple work items in parallel. In simple systems you end up with a “pipeline” and work items flow from one end to another; more complex systems may be modelled as networks of processors. You can tune the system by adjusting the number of threads in each processor’s thread pool and can also do things like having different processors run at different thread priorities (if you really want to). Since a work item is only ever being acessed by a single processor at a time, the data in the work item doesnt need any locking. If a processor needs to access data which can be shared (either by instances of a processor or by different processors) then normal locking is required but the situations where locking IS needed are greatly reduced.
I find it interesting that the Dr. Dobbs article points out that ‘careful measurement is required’. I agree, this is one of those situations where it’s vitally important to include performance monitoring (via perfmon counters?) from the outset. Unless you can see how many threads are active at each stage in the pipeline and how many work items are in each of the queues then you simply cannot tune the system in a meaningful manner.