I’ve been splunking around Dll loading recently for a pet project. It’s been an interesting journey and this evening I solved the final piece of the puzzle and, when I did, I suddenly wondered, not for the first time, why Windows holds the loader lock when calling
Chris Brumme explains this much better than me, but; the loader lock is a system wide lock that’s held by the OS when it does stuff to its internal process tables; things like loading and unloading Dlls. You need to be careful what you do in your
DllMain() because if you’re not careful you can deadlock on the loader lock and, well, that’s really bad… This has recently caused problems in managed C++ .Net code.
(a) very old
(b) wrong in places
Read the comments.
For a while I’ve been of the opinion that you can see the loader lock in action when Windows starts up and you have masses of apps all starting up at once and all of them load a mass of Dlls and everything takes an age to load even though this is the fastest box you could buy; or when a screaming multi-proc box with masses of memory crashes to a halt for a second or so and the whole world locks up for no apparent reason whilst explorer goes la-la for a moment - but perhaps there are other reasons for these things…
Anyway, I’ve often wondered why the OS calls
DllMain() from within the loader lock when, to me at least, it seems like it would be far better to call it after releasing the lock. The only reason I can think of is that if
DllMain() fails then the loader needs to unload the Dll again and to do that it needs to frig with the process tables again and to do that it needs to hold the loader lock again… If that’s the case, surely you can release the lock, run
DllMain() and then act on the result. If you need to unload the Dll due to initialisation failure then you just acquire the loader lock again… I expect I’m missing something obvious but…
The reason I’m thinking about all this is that I’ve been playing with intercepting functions in Dlls. There’s plenty of info on the web and in my book collection for this kind of thing and I got 80% done in no time. As usual the last 20% took 80% more time… There I was, happily hooking various API functions from my Dll and ‘doing stuff’ when I decided that I should intercept some more functionality. I hooked the new API calls and was surprised to see that my hooks were ignored. It took a while to work out what was going on. The calls to the API were happening before I hooked the Dll. They were happening before
LoadLibrary() returned because they were part of static object initialisation within the Dll. I called
LoadLibrary() it loaded the Dll, pulled in all the dependant Dlls, did all the fixup magic and then ran
DllMain(). Since the Dll in question used the standard C runtime the Dll entry point that was called was actually
_DllMainCRTStartup@12() which deals with starting up the C runtime, part of which includes calling constructors for file scope static objects that live in the same file as your
DllMain() and then calling
DllMain(). Since I, and all the examples that I’ve seen, hook Dlls after
LoadLibrary() returns I was unable to hook the API in question before the Dll called into it.
Looking at the docs for
LoadLibraryEx() I hoped that I could just use
DONT_RESOLVE_DLL_REFERENCES to prevent
DllMain() being called. Unfortunately,
DONT_RESOLVE_DLL_REFERENCES does exactly what it says on the tin, it doesn’t resolve any Dll references and it doesn’t call
DllMain(); unfortunately this made it useless to me as I needed the Dll references resolved before I could hook them…
I searched quite hard for a fix to this problem, thinking that it was an obvious thing that people would want to do. I didn’t find any solutions so I started to delve into how Dlls were loaded. This research ended up with the Microsoft Portable Executable and Common Object File Format Specification which is the document on PE files (Dlls, exes, etc). The code that I had for hooking already read information out of the Dlls PE file to locate import address tables and the like, I hoped there was something else in there that would help me hook
DllMain(). I didn’t find what I was looking for, but I found something much more useful.
Early on in my investigation of the PE file format I came across an interesting sounding field in one of the file header structures;
AddressOfEntryPoint this is the address that’s used to start the image in the PE file; it points to whatever calls
main() for exes and whatever calls
DllMain() for Dlls. Once I found out about this I decided that I could change this value and make it point somewhere else so that
DllMain() wasn’t called during
LoadLibrary() but when I wanted it to be called. I started dusting off my inline assembler skills (didn’t take long, they’re not very big) and thinking about how I could move the entry point address so that it pointed to the end of the real
DllMain() and all kinds of other cunning plans. Then I read the document and saw that the field could be set to 0 for Dlls that didn’t have entry points; like resource only Dlls, I imagine.
So, my current solution to hooking Dlls so that I get my hooks in place before
DllMain() runs is this. First I load the Dll image from disk and read the
AddressOfEntryPoint field from the header, I resolve the RVA into a real address and write a 0 back into the image. Then I call
LoadLibrary() which loads the Dll and all dependant Dlls, does all the fix ups that are required and doesn’t execute
DllMain(). Then I run my hook code and finally I use a few bits of
__asm magic to set the stack just right and call into
DllMain(). This all works fine and achieves just what I want to do but now I’m nervous. If this works then why doesn’t the OS do it like this in the first place? Why doesn’t Windows release the loader lock before calling