|
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389 |
- Introduction
- ============
- This project aims to give a simple overview on how good various x64 hooking
- engines (on windows) are. I'll try to write various functions, that are hard to
- patch and then see how each hooking engine does.
-
- I'll test:
-
- * [EasyHook](https://easyhook.github.io/)
- * [PolyHook](https://github.com/stevemk14ebr/PolyHook)
- * [MinHook](https://www.codeproject.com/Articles/44326/MinHook-The-Minimalistic-x-x-API-Hooking-Libra)
- * [Mhook](http://codefromthe70s.org/mhook24.aspx)
-
- (I'd like to test detours, but I'm not willing to pay for it. So that isn't
- tested :( )
-
- There are multiple things that make hooking difficult. Maybe you want to patch
- while the application is running -- in that case you might get race conditions,
- as the application is executing your half finished hook. Maybe the software has
- some self protection features (or other software on the system provides that,
- e.g. Trustee Rapport)
-
- Evaluating how the hooking engines stack up against that is not the goal here.
- Neither are non-functional criteria, like how fast it is or how much memory it
- needs for each hook. This is just about the challenges the function to be
- hooked itself poses.
-
- Namely:
-
- * Are jumps relocated?
- * What about RIP adressing?
- * If there's a loop at the beginning / if it's a tail recurisve function, does
- the hooking engine handle it?
- * How good is the dissassembler, how many instructions does it know?
- * Can it hook already hooked functions?
-
- At first I will give a short walk through of the architecture, then quickly go
- over the test cases. After that come the results and an evaluation for each
- engine.
-
- I think I found a flaw in all of them; I'll publish a small POC which should at
- least detect the existence of problematic code.
-
- **A word of caution**: my results are worse than expected, so do assume I have
- made a mistake in using the libraries. I went into this expecting that some
- engines at least would try to detect e.g. the loops back into the first few
- bytes. But none did? That's gotta be wrong.
-
- **Another word of caution**: parts of this are rushed and/or ugly. Please
- double check parts that seem suspicious. And I'd love to get patches, even for
- the most trivial things -- spelling mistakes? Yes please.
-
- Architecture
- ============
- This project is made up of two parts. A .DLL with the test cases and an .exe
- that hooks those, tests whether they still work and prints the results.
-
- (I could have done it all in the .exe but this makes it trivial to (at some
- point) force the function to be hooked and the target function to be further
- apart than 2GB. Just set fixed image bases in the project settings and you're
- done)
-
- My main concern was automatically identifying whether the hook worked. I
- consider a hook to work if: a) the original function can still execute
- successfully *and* b) the hook was called.
-
- The criteria a) is really similar to a unit test. Verify that a function
- returns what is expected. So for a) the .exe just runs unit tests after all the
- hooks have been applied. Each failing function is reported (or the program
- crashes and I can look at the callstack) so I can correlate that with which
- hooking engine I'm currently testing and see where those fail. I've used
- Catch2 for the unit tests, because I wanted to try it anyway.
-
- From the get-to it was clear that I wanted to test multiple hooking engines.
- And they all needed to do the same steps in the same order -- so I implemented
- a basic AbstractHookingEngine with a boolean for every test case and make a
- child class for each engine. The children classes have to overwrite `hook_all`
- and `unhook_all`. Inbetween the calls to that, the unit tests run.
-
- Test case: Small
- ================
- This is just a very small function; it is smaller than the hook code will be -
- so how does the library react?
-
-
- _small:
- xor eax, eax
- ret
-
-
- Test case: Branch
- =================
- Instead of the FASM code I'll show the disassembled version, so you can see the
- instruction lengths & offsets.
-
-
- 0026 | 48 83 E0 01 | and rax,1
- 002A | 74 17 | je test_cases.0043 --+
- 002C | 48 31 C0 | xor rax,rax |
- 002F | 90 | nop |
- 0030 | 90 | nop |
- ... |
- 0041 | 90 | nop |
- 0042 | 90 | nop |
- 0043 | C3 | ret <----------------+
-
-
- This function has a branch in the first 5 bytes. Hooking it detour-style isn't
- possible without fixing that branch in the trampoline. The NOP sled is just so
- the hooking engine can't cheat and just put the whole function into the
- trampoline. Instead the jump in the trampoline needs to be modified so it jumps
- back to the original destinations
-
- Test case: RIP relative
- =======================
- One of the new things in AMD64 is RIP relative addressing. I guess the reason
- to include it was to make it easier to generate PIC -- all references to data
- can now be made relative, instead of absolute. So it doesn't matter anymore
- where the program is loaded into memory and there's less need for the
- relocation table.
-
- A quick and dirty[1] test for this is re-implementing the well known C rand
- function.
-
-
- public _rip_relative
- _rip_relative:
- mov rax, qword[seed]
- mov ecx, 214013
- mul ecx
- add eax, 2531011
- mov [seed], eax
-
- shr eax, 16
- and eax, 0x7FFF
- ret
-
- seed dd 1
-
-
- The very first instruction uses rip relative addressing, thus it needs to be
- fixed in the trampoline.
-
- Test case: AVX & RDRAND
- =======================
-
- The AMD64 instruction set is extended with every CPU generation. Becayse the
- hooking engines need to know the instruction lengths and their side effects to
- properly apply their hooks, they need to keep up.
-
- The actual code in the test case is boring and doesn't matter. I'm sure there
- are disagreements on whether I've picked good candidates of "exotic" or new
- instructions, but those were the first that came to mind.
-
- (It's also doubtful whether you'll ever encounter functions where the first
- instructions are of this category, because most probably there's some setup
- needed before, e.g. checking that adresses are aligned, initalizing loop
- counters, yadda, yadda)
-
- Test case: loop and TailRec
- ===========================
-
- My hypothesis before starting this evaluation was that those two cases would
- make most hooking engines fail. Back in the good ol' days of x86 detour hooking
- didn't require any special thought because the prologue was exactly as big as
- the hook itself -- 5 bytes for `PUSH ESP; MOV EBP, ESP` and 5 bytes for `JMP +-
- 2GB`[2]. That isn't so easy for AMD64: a) the hook sometimes needs to be *way*
- bigger b) due to changes in the calling convention and the general architecture
- of AMD64 there just isn't a common prologue, used for almost all functions,
- anymore.
-
- Those by itself arn't a problem, since the hooking engines can fix all the
- instructions they would overwrite. However I hypothesized that only a few would
- check whether the function contained a loop that jumps back into the
- instructions that have been overwritten. Consider this:
-
- public _loop
- _loop:
- mov rax, rcx
- @loop_loop:
- mul rcx
- nop
- nop
- nop
- loop @loop_loop ; lol
- ret
-
- There's only 3 bytes that can be safely overwritten. Right after that is the
- destination of the jump backwards. This is a very simple (and kinda pointless)
- function so detecting that the loop might lead to problems shouldn't be a
- problem. But consider what happens with MHook (and all the others):
-
- _loop original:
-
- 008C | 48 89 C8 | mov rax,rcx
- 008F | 48 F7 E1 | mul rcx
- 0092 | 90 | nop
- 0093 | 90 | nop
- 0094 | 90 | nop
- 0095 | E2 F8 | loop test_cases.008F
- 0097 | C3 | ret
-
- _loop hooked:
-
- 008C | E9 0F 69 23 00 | jmp <MHook_Hooks::hookLoop>
- 0091 | E1 90 | loope test_cases.0023
- 0093 | 90 | nop
- 0094 | 90 | nop
- 0095 | E2 F8 | loop test_cases.008F
- 0097 | C3 | ret
-
- trampoline:
-
- 00007FFF7CD200C0 | 48 89 C8 | mov rax,rcx
- 00007FFF7CD200C3 | 48 F7 E1 | mul rcx
- 00007FFF7CD200C6 | E9 C7 96 DC FF | jmp test_cases.0092
-
- then executes:
-
- 0092 | 90 | nop
- 0093 | 90 | nop
- 0094 | 90 | nop
- 0095 | E2 F8 | loop test_cases.008F
-
- But that jumps back into the middle of the jump and thus executes:
-
- 008F | 23 00 | and eax,dword ptr ds:[rax]
- 0091 | E1 90 | loope test_cases.0023
-
- Which isn't right and will crash horribly.
-
- (Preliminary) Results
- =====================
-
- +----------+-----+------+------------+---+------+----+-------+
- | Name|Small|Branch|RIP Relative|AVX|RDRAND|Loop|TailRec|
- +----------+-----+------+------------+---+------+----+-------+
- | PolyHook| X | X | X | X | | | |
- | MinHook| X | X | X | | | | X |
- | MHook| | | X | | | | |
- +----------+-----+------+------------+---+------+----+-------+
-
- As expected nothing could correctly hook the loop. In fact I had to comment out
- those parts because even Catch2 couldn't recover from the crashes generated by
- the botched hooks. Some hooking engines are a bit lacking in their support for
- newer instruction sets, but a simple update of the dissassembler library should
- fix that.
-
- I was pleasantly suprised by MinHook, both the general AIP and because it
- managed to build a trampoline that worked perfectly even for the tail
- recursion case. I'd recommend it, even though it seems theres no chance that
- the dissassembler will ever be updated.
-
- Detecting tail recursive functions / loops into overwritten code
- ================================================================
-
- Back in 2015 I wanted to write my own hooking engine which would be able to
- hook ALL THE FUNCTIONS! And I did actually start to write it and then
- abandoded it, before I got to the interesting part. However since then I had
- the basic idea down:
-
- 1) Find out how long the function is
- 2) Analyze it, by checking whether some jump could jump into the overwritten
- instructions
- 3) Somehow fix that
-
- Fixing that code probably means putting the whole function in the trampoline,
- by definition there is no space where to put the additional/longer instructions.
-
- However I think that hooking engines should at least fail fast if they can't
- hook that function and give the user the ability to handle that error at that
- stage instead of waiting for unpredictable crashes. I'll post example code
- [here](https://git.free-hack.com/wacked/x64hook) and outline the general
- technique below.
-
- (My x64hook hooking engine doesn't work. There's literally two interesting
- functions in it, and I give pseudocode for them below)
-
- Estimate the length of a function
- ---------------------------------
-
- Note: This is an estimation of the function length. There's various ways to go
- about to do it, one way would be to search pro- and epilogue. Which would fail
- for all functions that -- for whatever reason -- don't have that. I'm sure this
- way also isn't perfect, but maybe it could be used as another source of
- information[5].
-
- Over the years I've seen various attempts at estimating the function length.
- One of the top hits for my google history is a question on stackoverflow
- which[3] uses the same technique that I've seen in various malware strains -
- checking byte for byte until the RET opcode is found. Which won't work if
- either:
-
- 1) The `RET imm16` opcode is used, which is often the case for __stdcall funcs.
- 2) There are multiple returns
- 3) The function doesn't actually return with the RET instruction. For example
- if a function A at its end calls another function B, with A and B sharing the
- same parameters and either A or B not modifying the stack pointer it is
- perfectly possible to just jump to function B. Exectution will continue in B,
- which ends with a normal RET.
- 4) The value 0xC3 appears for some other reason in the function.
-
- 4) can be easily solved by using a length disassember engine and just checking
- the actual instruction byte. 1) and 3) aren't that hard either, you'll just
- need to check for some additional opcodes. What about 2)?
-
- The key insight I had was why a function might have multiple returns -- because
- it needed to do additional work in some cases. Which meant that there had to be
- branching, to sometimes skip some instructions or get to them.
-
- If there is a branch backwards it's a loop. But a branch forwards means that
- the function extends at least up to there[4]. Or in pseudocode:
-
- offsetOfInstr = 0
- funcLen = 0
- furthestJump = 0
- while(can dissasemble next instruction)
- {
- offsetOfInstr += funcLen;
-
-
- op = getOpcode(instruction);
- if(is_jump(op))
- {
- off = get_jump_offset(instruction);
- if(off > furthestJump)
- furthestJump = off;
- }
-
- if(is_end_of_function(op, furthestJump, offsetOfInstr))
- {
- break;
- }
- }
-
- bool is_end_of_function(opc, furthestJump, instrOffset)
- {
- if(opc == RET && furthestJump <= instrOffset)
- return true;
- else if(opc == UD_Ijmp)
- {
- if(destination is IMM || destination is register)
- return true;
- }
-
- return false;
- }
-
-
- Detecting loops to the start of a function
- ------------------------------------------
-
- firstJumpOffset = MAX_INT
- foreach(instruction in function)
- if(instruction is a jump)
- jumpOffset = getOffset(instruction) // relative to function start
-
- /* jumps to exactly the start of a function are fine, since that is
- where our overwritten code starts. Thus it doesn't jump into the middle
- of an instruction */
- if(jumpOffset == 0)
- continue
-
- if(jumpOffset < firstJumpOffset)
- firstJumpOffset = jumpOffset;
-
- return firstJumpOffset < lengthNeededForHook
- ------------
-
- [1] This is one of the things that could easily be improved, but haven't been
- because I just couldn't motivate myself. Putting the data right after the func
- meant that a section containing code needed to be writable. Which is bad. Also
- I load the seed DWORD as a QWORD -- which only works because the upper half is
- then thrown away by the multiplication. It's shitty code is what I'm saying.
-
- In retrospect I should have used a jump table like a switch-case could be
- compiled into. That would be read only data. Oh well.
-
- [2] And Microsoft decided at some point to make it even easier for their code
- with the advent of hotpatching.
-
- [3] https://stackoverflow.com/questions/8705215/get-the-size-length-of-a-c-function
-
- [4] With some caveats, e.g. one could assume that no function is longer than
- 512 bytes. And obviously keeping in mind point 3
-
- [5] Another heuristic would be to check for the next slide of filler
- instructions, such as INT3 or NOP. Some compilers align functions on 16byte
- boundarys and fill the gaps with those
|