You can achieve this by code signature scanning, which is something I have done in the past. The concept mainly works by relying on the fact that functions often do not change too much between updates, but simply relocate because they were pushed forward or back by other functions being expanded or shrunk.
Let's take the example of MessageBoxA
, who's disassembly looks like this for me:
765DEA11 > 8BFF MOV EDI,EDI
765DEA13 55 PUSH EBP
765DEA14 8BEC MOV EBP,ESP
765DEA16 833D 749A5E76 00 CMP DWORD PTR DS:[765E9A74],0
765DEA1D 74 24 JE SHORT USER32.765DEA43
765DEA1F 64:A1 18000000 MOV EAX,DWORD PTR FS:[18]
765DEA25 6A 00 PUSH 0
765DEA27 FF70 24 PUSH DWORD PTR DS:[EAX+24]
765DEA2A 68 A49E5E76 PUSH USER32.765E9EA4
765DEA2F FF15 34145876 CALL DWORD PTR DS:[<&KERNEL32.Interlocke>; kernel32.InterlockedCompareExchange
765DEA35 85C0 TEST EAX,EAX
765DEA37 75 0A JNZ SHORT USER32.765DEA43
765DEA39 C705 A09E5E76 01>MOV DWORD PTR DS:[765E9EA0],1
765DEA43 6A 00 PUSH 0
765DEA45 FF75 14 PUSH DWORD PTR SS:[EBP+14]
765DEA48 FF75 10 PUSH DWORD PTR SS:[EBP+10]
765DEA4B FF75 0C PUSH DWORD PTR SS:[EBP+C]
765DEA4E FF75 08 PUSH DWORD PTR SS:[EBP+8]
765DEA51 E8 73FFFFFF CALL USER32.MessageBoxExA
765DEA56 5D POP EBP
765DEA57 C2 1000 RETN 10
The trick is to guess at some block of code which you think is likely to stay the same in an update, but more importantly is unique to this function. Typically, it is useless to scan for the epilogue/prologue. I would probably take the following block:
765DEA16 833D 749A5E76 00 CMP DWORD PTR DS:[765E9A74],0
765DEA1D 74 24 JE SHORT USER32.765DEA43
765DEA1F 64:A1 18000000 MOV EAX,DWORD PTR FS:[18]
765DEA25 6A 00 PUSH 0
765DEA27 FF70 24 PUSH DWORD PTR DS:[EAX+24]
765DEA2A 68 A49E5E76 PUSH USER32.765E9EA4
765DEA2F FF15 34145876 CALL DWORD PTR DS:[<&KERNEL32.Interlocke>;
You have to make a balance when choosing the length of the block. The longer the block, the more likely it is to uniquely identify a function, but also the more likely it is that some code will be inserted during the update which means it is split, etc. Note that the block I have chosen has multiple memory references. We can not rely on any data or function addresses since these may be relocated on the next update, so we fill those bytes with wildcards:
765DEA16 833D XXXXXXXX 00 CMP DWORD PTR DS:[XXXXXXXX],0
765DEA1D 74 XX JE SHORT XXXXXXXX
765DEA1F 64:A1 18000000 MOV EAX,DWORD PTR FS:[18]
765DEA25 6A 00 PUSH 0
765DEA27 FF70 24 PUSH DWORD PTR DS:[EAX+24]
765DEA2A 68 XXXXXXXX PUSH XXXXXXXX
765DEA2F FF15 XXXXXXXX CALL DWORD PTR DS:[XXXXXXXX]
This means our byte signature is now:
0x83 0x3D 0x? 0x? 0x? 0x? 0x74 0x? 0x64 0xA1 0x18 0x00 0x00 0x00 0x6A
0x00 0xFF 0x70 0x24 0x68 0x? 0x? 0x? 0x? 0xFF 0x15 0x? 0x? 0x? 0x?
The 0x?
bytes indicate wildcards which are bytes we expect to change. The other ones are bytes we expect will not change in the update. To use the bytes to locate the function at runtime, you need to scan for these bytes (taking into account the wildcards). The process is approximately so:
- Enumerate all executable pages of the process (
VirtualQueryEx
)
- Scan for the byte signature we found (taking into account the wildcards - this is trivial to implement as a
for
loop which skips wildcard bytes)
- To obtain the true function address, fix up the address you get with the offset of the block from the original function (in this case,
0x765DEA16 - 0x765DEA11 => 0x5
)
Actually, rather than enumerating all executable pages, it is often enough to find what module the function lies within (user32.dll
) in this case, and search within that module only.