|
Writing a DLL that will be injected into another processes address
space in Delphi (or BCB for that matter) has been plagued with rumors
of instability and doom. Most have had to venture back to Microsoft
compilers to produce stable DLLs for this purpose. Are these
accusations true? In short... yes. If the DLL is injected into a
process that makes heavy use of threads, Explorer is a classic
example, then then a Delphi DLL was not a viable choice for a
production application. There were many rumors floating around the
web and from Delphi experts and most were of the opinion the problem
was in the way Delphi managed Thread Local Storage (TLS) in the DLL
during initialization and shut down.
I spent countless hours single stepping through a context menu
extension for Explorer trying to get it to consistently reproduce an
AV that would occasionally occur. Reports from the news group seems
to point to the problem was more severe on networked environments. I
concluded the same as the problem was more prevalent if I was
connected to the Internet logged in as a network.
I struggled with this for over 3 years. Finally I wanted to
try to bring together my EasyNSE package but was reluctant to do so
due to the instability problems. I had found one "patch"
that seems to make the DLL's stable but that was not enough for me. I
needed to understand the cause and not rely on a magic API call that
"seemed" to fix the problem.
|
library MysteryHookFixSample;
begin
DisableThreadLibraryCalls(hInstance)
end. |
The above code inserted into the library would seem to fix the
problem, but why?
A few months ago I again fired up Win98 (the easiest platform
for me to reproduce the problem) and went after the root cause of the
problem again. Apparently the alignment of the stars was such I was
able to reproduce the AV consistently! At the time I was talking to
Mathias from MadExcept
fame. Mathias quickly tracked the problem down and found a simple
solution that could be implemented outside of the Runtime Library
(RTL). Here is his analysis:
|
"I have found the bug in the RTL and I have a clean
workaround. Let me
explain the situation:
DLL_PROCESS_ATTACH -> SysInit.InitProcessTLS -> SysInit.InitThreadTLS
DLL_THREAD_ATTACH -> SysInit.InitThreadTLS
DLL_THREAD_DETACH -> SysInit.ExitThreadTLS
DLL_PROCESS_DETACH -> SysInit.ExitProcessTLS -> SysInit.ExitThreadTLS
As you can see, DLL_XXX_ATTACH always ends up in
"InitThreadTLS", while
DLL_XXX_DETACH always ends up in "ExitThreadTLS". We
can forget about
"Init/ExitProcessTLS". Now let's go through some
situations. For all of
the following cases please note that the events are meant to
be for the
very same thread (that's important!!).
(1)
InitThreadTLS
ExitThreadTLS
-> perfectly fine (standard case)
(2)
ExitThreadTLS
-> strange situation, but no problems
(3)
InitThreadTLS
-> memory leak (but I think this situation will never occur)
(4)
InitThreadTLS
InitThreadTLS
ExitThreadTLS
-> memory leak (but I think this situation will never occur)
(5)
InitThreadTLS
ExitThreadTLS
ExitThreadTLS
-> LocalAlloc gets called twice for the same pointer
Now what happens when doing CBT hooking in win98 is situation
(5). The
very same thread gets a DLL_PROCESS_ATTACH + DLL_THREAD_DETACH +
DLL_PROCESS_DETACH event. And the end result is the Explorer
crash. If
you ask me, the Borland programmers didn't believe, that case
(5) can
happen - but it does. Now let's look at my patched "ExitThreadTLS"
function. I just added one line: "
|
So how do you fix it? When a process loads a DLL the RTL first
gets a crack at the entry point of the DLL. It sets up some variables
and allocates a Thread Local Storage slot for each thread or process
that attaches to the DLL. This is an important point as there are
only 64 TLS slots available so the call the DisableThreadLibraryCalls
is still a good idea if you don't need to know about threads being
attached to the DLL. Why does is this done in the RTL? I think it is
to support the ThreadVar type in Delphi. Each thread can have
a unique copy of data.
During this initialization Delphi allows the DLL to setup a
function that will be called every time a process or thread is
attached or detached from the DLL.
|
library HookFixSample;
implementation
procedure DLLEntryProc(EntryCode: integer);
begin
case EntryCode of
DLL_PROCESS_DETACH:
begin
end;
DLL_PROCESS_ATTACH:
begin
end;
DLL_THREAD_ATTACH:
begin
end;
DLL_THREAD_DETACH:
begin
end;
end;
end;
begin
DLLProc := @DLLEntryProc;
// Since we are already in the Process
Attache to get to this point we call the function
// manually
DLLEntryProc(DLL_PROCESS_ATTACH);
end. |
Now based on Mathias's debugging the problem occurs during
Thread detaching so we can fix the problem there.
|
library HookFixSampleD6andD7;
implementation
procedure madPatch_ExitThreadTLS;
var
p: Pointer;
begin
if @TlsLast = nil then
Exit;
if TlsIndex <> -1 then
begin
p := TlsGetValue(TlsIndex);
if p <> nil then
begin
// The RTL will check the TLS value fo
nil so if we Free it first then
// set it to nil when the RTL tries to
free it will find it set to nil and
// skip it
LocalFree(Cardinal(p));
TlsSetValue(TlsIndex, nil); // <-
this fixes case (5), the RTL does not nil the value
end;
end;
end;
procedure DLLEntryProc(EntryCode: integer);
begin
case EntryCode of
DLL_PROCESS_DETACH:
begin
end;
DLL_PROCESS_ATTACH:
begin
end;
DLL_THREAD_ATTACH:
begin
end;
DLL_THREAD_DETACH:
begin
madPatch_ExitThreadTLS;
end;
end;
end;
begin
DLLProc := @DLLEntryProc;
// Since we are already in the Process
Attache to get to this point we call the function
// manually
DLLEntryProc(DLL_PROCESS_ATTACH);
end. |
This was all well and good until it was tried in D5. The RTL
changed a bit between D5 and D6, likely do to Kylix, but for what
ever reason it breaks the above fix.
At this point everything is great and we are ready to create
stable and robust COM and Hook DLLs right? Well if you are using D6
and greater yes, if using D5 or lower no. There is another problem
with the RTL implementation. In these compilers the Floating Point
Unit is not correctly setup when the DLL is initialized, we need to
do it for the DLL.
|
library HookFixSampleD4-D7;
implementation
procedure madPatch_ExitThreadTLS;
var
p: Pointer;
begin
if @TlsLast = nil then
Exit;
if TlsIndex <> -1 then
begin
p := TlsGetValue(TlsIndex);
if p <> nil then
begin
// The RTL will check the TLS value fo
nil so if we Free it first then
// set it to nil when the RTL tries to
free it will find it set to nil and
// skip it
{$IFNDEF COMPILER_5_UP}
// D5 and
lower have already freed the TLS slot before calling this function
// In these compilers we can't free the
memory but we can nil it.
LocalFree(Cardinal(p));
{$ENDIF COMPILER_5_UP}
TlsSetValue(TlsIndex, nil); // <-
this fixes case (5), the RTL does not nil the value
end;
end;
end;
var
// D5 Fixes this problem;
{$IFNDEF COMPILER_5_UP}
ControlWord: Word;
{$ENDIF}
procedure DLLEntryProc(EntryCode: integer);
begin
case EntryCode of
DLL_PROCESS_DETACH:
begin
// D5 Fixes this problem;
{$IFNDEF COMPILER_5_UP}
Set8087CW(ControlWord);
{$ENDIF}
end;
DLL_PROCESS_ATTACH:
begin
// D5 Fixes this problem;
{$IFNDEF COMPILER_5_UP}
Set8087CW($133f);
{$ENDIF}
end;
DLL_THREAD_ATTACH:
begin
end;
DLL_THREAD_DETACH:
begin
madPatch_ExitThreadTLS;
end;
end;
end;
begin
DLLProc := @DLLEntryProc;
// Since we are already in the Process
Attache to get to this point we call the function
// manually
DLLEntryProc(DLL_PROCESS_ATTACH);
end. |
That's it. If this template is used to create your Hook or COM
object DLL it will produce a DLL that is just as stable as one
developed in a Microsoft compiler. Hopefully Borland will have this
fixed in the next version of Delphi.
|