Preventing Stack Corruption

I recently investigated stack corruption issue related to P/Invoke. In this post I am going to share my experience. I will show you a simple and yet effective approach to avoid similar problems.

The Bug

A colleague of mine discovered the bug during debugging piece of code in JustTrace dealing with ETW. The issue was quite tricky because the bug manifested only in debug build by crashing the program. Considering that JustTrace requires administrator privileges I can only guess what could be the consequence of this bug when executed in release build. Take a look at code fragment shown on the following screen shot.

The code is single threaded and looks quite straightforward. It instantiates an object and tries to use it. The constructor is executed without any exceptions. Still when you try to execute the next line the CLR throws an exception with the following message:

Attempted to read or write protected memory. This is often an indication that other memory is corrupt.

Solution 1: Managed Debugging Assistants

I am usually pessimistic when it comes to MDA but I decided to give it a try. At first I tried MDA from within Visual Studio 2012.

It didn’t work. Then I tried MDA from within windbg. Still no luck. In general my experience with MDA is not positive. It is very limited and works for simple scenarios (e.g. incorrect calling convention) only.

Solution 2: Using Disassembly Window

It does work. In case you are familiar with assembly language this is the easiest way to fix your program. In my case I was lucky and I had to debug a few hundred lines only. The reason was incorrect TRACE_GUID_REGISTRATION definition.

internal struct TRACE_GUID_REGISTRATION
{
   private IntPtr guid;
   // helper methods
}

This data structure was passed to RegisterTraceGuids function as in/out parameter and there was the stack corruption.

The Fix

A few things are wrong with TraceGuidRegistration definition. The first thing is that TraceGuidRegistration does not define the “handle” field. The second thing is that TraceGuidRegistration is not decorated with StructLayout attribute and this could be crucial. Here comes the correct definition.

[StructLayout(LayoutKind.Sequential)]
internal struct TRACE_GUID_REGISTRATION
{
   private IntPtr guid;
   private IntPtr handle;
   // helper methods
}

Solution 3: FXCop – Using metadata to prevent the bug

Once I fixed the code I started thinking how I can avoid such bugs. I came up with the idea to use FXCop tool which is part from Visual Studio 2010 and later. My intention was to decorate my data structure with a custom StructImport attribute like this:

[StructImport("TRACE_GUID_REGISTRATION", "Advapi32.dll")]
[StructLayout(LayoutKind.Sequential)]
internal struct TraceGuidRegistration
{
   private IntPtr guid;
   private IntPtr handle;
   // helper methods
}

To check whether it is possible I started JustTrace under windbg and loaded the symbols from Microsoft Symbol Server. I was surprised to see that there are four modules that export TRACE_GUID_REGISTRATION and none of them was advapi32.

That’s OK. All I need is the information about TRACE_GUID_REGISTRATION layout. I quickly did a small prototype based on the DIA2Dump sample from DIA SDK (you can find it under <PROGRAM_FILES>\Microsoft Visual Studio 10.0\DIA SDK\Samples\DIA2Dump folder). I embedded the code into a custom FXCop rule and tested it. All works as expected.

After a short break I observed that I could take another approach so I started refactoring my code.

Solution 4: FXCop – Using convention to prevent the bug

The previous solution works just fine. You apply the attribute on the data structures and the FXCop rule will validate if everything is OK. One of the benefits is that now you can name your data structures as you wish. For example you can name it TraceGuidRegistration instead of TRACE_GUID_REGISTRATION. However the two names are practically equal. Also, as I said I was surprised that TRACE_GUID_REGISTRATION is not defined in advapi32 module. As a matter of fact I don’t care where it is defined.

So I decided to do my mappings in slightly different way. Instead of applying StructImport attribute I inspect the signature of all methods decorated with DllImport attribute. For example I can inspect the following method signature:

[DllImport("AdvApi32", CharSet = CharSet.Auto, SetLastError = true)]
static extern int RegisterTraceGuids(
    ControlCallback requestAddress,
    IntPtr requestContext,
    ref Guid controlGuid,
    int guidCount,
    ref TraceGuidRegistration traceGuidReg,
    string mofImagePath,
    string mofResourceName,
    out ulong registrationHandle);

I know that the fifth parameter has type TraceGuidRegistration so I can try to map it. What is nice of this approach is that I can verify that both the TraceGuidRegistration layout is correct and that the StructLayout attribute is applied. And these were the two things that caused the stack corruption.

Conclusion

Once I refactored my FXCop rule to use convention instead of explicit attribute declaration I start wondering whether such FXCop rules could be provided by Microsoft. So far I don’t see obstacles for not doing so. The task is trivial for all well-known data structures provided by Windows OS. All needed is an internet connection to the Microsoft Symbol Server. I guess the StructImport solution could be applied for any custom data structure mappings. I hope in the future Visual Studio versions Microsoft will prove a solution for such kind of bugs.

2 replies on “Preventing Stack Corruption”

Comments are closed.