This cool article is good case to show how you can employ static code analysis for extracting some unexported symbols from binary code - in this case we need ExNPagedLookasideLock & ExNPagedLookasideListHead
Sure the first thing you need is disassembler. If you search at GitHub "x86 disasm" you will get something about 20 repositories, but we need one that satisfies some requirements:
So just choose the one with the most comprehensible code - bcs they all contains bugs and you anyway will fix them and/or add missed instructions
Lets start with exported function ExInitializeNPagedLookasideList. Simplest cases - xp 64bit:
You can just collect all lea instructions which loading addresses from .data section, r8 will hold address of lock and rcx list. ok, this was simple, so lets see more complex case - w2k3 64bit
Ups, Bad Thing happened - you can`t disasm after int3 instruction bcs this is something that should not happen during normal execution. This is where you need graph codeflow. Obvious choice is to consider codeblocks as nodes and jumps to other code blocks as edges. Sure each node can have lots of directed edged (and some of this edges even can point to the same node - so we have directed graph with loops). The good news is that you don`t need to build whole graph - bcs you anyway discovering all your nodes and edges dynamically. It`s like Lee algo for pathfinding in maze - you just need to have set of codeblocks which you processing, another set of edges which you will process in next cycle, and some storage for already processed blocks. So algo is simple:
So in our sample from w2k3 disasm will stop at address of int3 but set of newly discovered addresses will contains new address 52A748:
exactly like in previous case. Lets see more complex case - w7 64bit:
Ups-2, there was nothing like loading some addresses. But you can collect calls and also add them to edge storage. Function ExInitializeNPagedLookasideListInternal is very big and complex so lets see something interesting at address 140166277:
Function KxWaitForSpinLockAndAcquire is not exported but you can recognize lock with instruction lock bts [address in .data section]. Now you know that you have lock address and in one of two blocks after this you will have safely access to list - so you need associate some State with all of your edges. Lets say for this simple case that state 0 means that we still search for lock and state 1 means that we got lock and now expect list address. So you can mark addresses 1401662A6 and 1401662A9 as having state 1 and then process them in some other way
There is one important question - how safe you can skip processing of nodes with state 0 when you already have some edges with state 1? Answer - it depends from your state machine. You can draw on paper all your states and if this graph has no branches - you can skip all processing in this cycle blocks with lower state:
But lets see other state graph (branches can occur when we try to find several addresses or there are several way to find the same address. And you clearly doing something wrong if your state graph has loops):
In this case you can safely skip only edges with state 0 when you in state 1. But you can`t skip edges with state 1 or 2 when you processing block with state 3
Sure the first thing you need is disassembler. If you search at GitHub "x86 disasm" you will get something about 20 repositories, but we need one that satisfies some requirements:
- disasm to some intermediate code and not in string output
- can be used in kernel mode (just in case if you want to do it) which means that it must be written in C
So just choose the one with the most comprehensible code - bcs they all contains bugs and you anyway will fix them and/or add missed instructions
Lets start with exported function ExInitializeNPagedLookasideList. Simplest cases - xp 64bit:
...
mov [rcx+54h], r10d
lea rdx, [rcx+40h]
lea r8, ExNPagedLookasideLock
lea rcx, ExNPagedLookasideListHead
add rsp, 28h
jmp ExInterlockedInsertTailList
You can just collect all lea instructions which loading addresses from .data section, r8 will hold address of lock and rcx list. ok, this was simple, so lets see more complex case - w2k3 64bit
sub rsp, 28h
test cl, 0Fh
jz short loc_52A748
mov r8, [rsp+28h]
mov rdx, rcx
lea rcx, aInitializeslis
call DbgPrint
mov ecx, 80000002h
call ExRaiseStatus
int 3 ; Trap to Debugger
Ups, Bad Thing happened - you can`t disasm after int3 instruction bcs this is something that should not happen during normal execution. This is where you need graph codeflow. Obvious choice is to consider codeblocks as nodes and jumps to other code blocks as edges. Sure each node can have lots of directed edged (and some of this edges even can point to the same node - so we have directed graph with loops). The good news is that you don`t need to build whole graph - bcs you anyway discovering all your nodes and edges dynamically. It`s like Lee algo for pathfinding in maze - you just need to have set of codeblocks which you processing, another set of edges which you will process in next cycle, and some storage for already processed blocks. So algo is simple:
- add your initial address (in this case address of ExInitializeNPagedLookasideList) in current set
- process all of codeblocks in your current set and extract links to other codeblocks in next set
- when you finished with current set - add all processed blocks to some "storage of processed" - it can be something simple like
std::list<std::pair< PBYTE, DWORD >>
or some heavy weapon like Boost interval container library (or some interval tree written in plain C) . The main idea is that this storage can tell you if your address already was processed - now you have set of newly discovered addresses, filter out already processed and put them in your current set
- if this set is empty - algo ends
- goto step 2
So in our sample from w2k3 disasm will stop at address of int3 but set of newly discovered addresses will contains new address 52A748:
loc_52A748:
...
lea r8, ExNPagedLookasideLock
lea rcx, ExNPagedLookasideListHead
add rsp, 28h
jmp ExInterlockedInsertTailList
exactly like in previous case. Lets see more complex case - w7 64bit:
ExInitializeNPagedLookasideList:
sub rsp, 48h
and [rsp+48h+var_10], 0
movzx eax, [rsp+48h+Depth]
mov [rsp+48h+var_18], ax
mov eax, [rsp+48h+Tag]
mov [rsp+48h+var_20], eax
mov rax, [rsp+48h+Size]
mov [rsp+48h+var_28], rax
call ExInitializeNPagedLookasideListInternal
add rsp, 48h
retn
Ups-2, there was nothing like loading some addresses. But you can collect calls and also add them to edge storage. Function ExInitializeNPagedLookasideListInternal is very big and complex so lets see something interesting at address 140166277:
loc_140166277:
add [rbx+4B00h], edi
lock bts cs:ExNPagedLookasideLock, 0
jnb short loc_1401662A6
lea rcx, ExNPagedLookasideLock
call KxWaitForSpinLockAndAcquire
add [rbx+4B04h], edi
add [rbx+4B08h], eax
mov r9d, eax
jmp short loc_1401662A9
Function KxWaitForSpinLockAndAcquire is not exported but you can recognize lock with instruction lock bts [address in .data section]. Now you know that you have lock address and in one of two blocks after this you will have safely access to list - so you need associate some State with all of your edges. Lets say for this simple case that state 0 means that we still search for lock and state 1 means that we got lock and now expect list address. So you can mark addresses 1401662A6 and 1401662A9 as having state 1 and then process them in some other way
There is one important question - how safe you can skip processing of nodes with state 0 when you already have some edges with state 1? Answer - it depends from your state machine. You can draw on paper all your states and if this graph has no branches - you can skip all processing in this cycle blocks with lower state:
But lets see other state graph (branches can occur when we try to find several addresses or there are several way to find the same address. And you clearly doing something wrong if your state graph has loops):
In this case you can safely skip only edges with state 0 when you in state 1. But you can`t skip edges with state 1 or 2 when you processing block with state 3