MiniDumpWriteDump now mostly useless for in process use
I’ve been using the MiniDumpWriteDump()
API from DbgHelp.dll for 20 years or so.
It has proven to be a useful diagnostic tool, and I use it in all manner of places,
including many where others may simply use an assert()
. It’s a heavy-weight debugging tool,
but it has proved useful over the years; rather than just throwing an exception
because things that shouldn’t happen have happened, I often also generate a dump
file so that I can get far more data than you could ever log or report in another
way.
The API call allows you to generate a dump for any process that you have the correct
access rights for, but I have, traditionally, used it for in-process dump generation.
The documentation has warned for a while that the call can deadlock on the
loader lock if used to generate a dump in this way, but I haven’t seen that in
practice. However, recently, I’ve been seeing more and more cases of deadlocks in
the dump generation process that are, it seems, due to MiniDumpWriteDump()
using
the heap to allocate memory after it has begun its work and, significantly, after
it has halted all other threads in the process.
It seems that, with the current MiniDumpWriteDump()
code, if any thread in the process
is inside the heap when the dump is triggered, then the process will deadlock on a
lock inside the heap. The thread in the heap has been suspended and will never
release the lock and the thread calling MiniDumpWriteDump()
then blocks as soon as
it tries to access the heap. For me, this manifests as a hung process with a zero byte
dump file.
The problem appears to have got worse in recent years. Of course, the documentation
isn’t versioned, and so it’s difficult to know when the warnings against in-process
use began, but I’m pretty sure that it was not originally an issue. Certainly the
amazingly detailed documentation from 2005
that I remember using when I was first writing my dump generation code doesn’t mention the issue.
It may be that this has always been a potential problem, and it’s just a race condition
that is more likely to occur on modern hardware, but it feels to me as if the API
wasn’t always this fragile. I’m pretty sure that the mini dump generation code used to use
_alloca()
so perhaps the issue is that the code has been changed to use
_malloca()
since this is viewed as being more secure. The main issue with this change is that,
with _alloca()
a failure to allocate space on the stack results in a SEH stack overflow
exception, with _malloca()
such a failure results in an attempt to allocate using the
heap… However, this is purely speculation and doesn’t get me anywhere…
Anyway, no point complaining about things you have no control over. The solution to this
issue is to do what the current documentation suggests and always generate the dump of
a process from a different process. This isn’t especially difficult to do, you need
the process id of the process that you want to generate a dump for and the correct access
rights. The simplest approach might be to spawn ProcDump
from sysinternals and have that
do the work for you, but I expect I’ll craft my own external dump process so that I can
have a bit more control. Ideally, the act of generating a dump will do as little as possible
to the state of the process that is generating the dump, so I have a design in mind that
simply requires the process that wishes to generate a dump of itself triggering an event
and the external process doing the work before setting a second event that the triggering
process is now waiting on.
We’ll see.