|
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248 |
- Introduction
- ============
- This project aims to give a simple overview on how good various x64 hooking
- engines (on windows) are. I'll try to write various functions, that are hard to
- patch and then see how each hooking engine does.
-
- I'll test:
-
- * [EasyHook](https://easyhook.github.io/)
- * [PolyHook](https://github.com/stevemk14ebr/PolyHook)
- * [MinHook](https://www.codeproject.com/Articles/44326/MinHook-The-Minimalistic-x-x-API-Hooking-Libra)
- * [Mhook](http://codefromthe70s.org/mhook24.aspx)
-
- (I'd like to test detours, but I'm not willing to pay for it. So that isn't
- tested :( )
-
- There are multiple things that make hooking difficult. Maybe you want to patch
- while the application is running -- in that case you might get race conditions,
- as the application is executing your half finished hook. Maybe the software has
- some self protection features (or other software on the system provides that,
- e.g. Trustee Rapport)
-
- Evaluating how the hooking engines stack up against that is not the goal here.
- Neither are non-functional criteria, like how fast it is or how much memory it
- needs for each hook. This is just about the challenges the function to be
- hooked itself poses.
-
- Namely:
-
- * Are jumps relocated?
- * What about RIP adressing?
- * If there's a loop at the beginning / if it's a tail recurisve function, does
- the hooking engine handle it?
- * How good is the dissassembler, how many instructions does it know?
- * Can it hook already hooked functions?
-
- At first I will give a short walk through of the architecture, then quickly go
- over the test cases. After that come the results and an evaluation for each
- engine.
-
- I think I found a flaw in all of them; I'll publish a small POC which should at
- least detect the existence of problematic code.
-
- **A word of caution**: my results are worse than expected, so do assume I have
- made a mistake in using the libraries. I went into this expecting that some
- engines at least would try to detect e.g. the loops back into the first few
- bytes. But none did? That's gotta be wrong.
-
- **Another word of caution**: parts of this are rushed and/or ugly. Please
- double check parts that seem suspicious. And I'd love to get patches, even for
- the most trivial things -- spelling mistakes? Yes please.
-
- Architecture
- ============
- This project is made up of two parts. A .DLL with the test cases and an .exe
- that hooks those, tests whether they still work and prints the results.
-
- (I could have done it all in the .exe but this makes it trivial to (at some
- point) force the function to be hooked and the target function to be further
- apart than 2GB. Just set fixed image bases in the project settings and you're
- done)
-
- My main concern was automatically identifying whether the hook worked. I
- consider a hook to work if: a) the original function can still execute
- successfully *and* b) the hook was called.
-
- The criteria a) is really similar to a unit test. Verify that a function
- returns what is expected. So for a) the .exe just runs unit tests after all the
- hooks have been applied. Each failing function is reported (or the program
- crashes and I can look at the callstack) so I can correlate that with which
- hooking engine I'm currently testing and see where those fail. I've used
- Catch2 for the unit tests, because I wanted to try it anyway.
-
- From the get-to it was clear that I wanted to test multiple hooking engines.
- And they all needed to do the same steps in the same order -- so I implemented
- a basic AbstractHookingEngine with a boolean for every test case and make a
- child class for each engine. The children classes have to overwrite `hook_all`
- and `unhook_all`. Inbetween the calls to that, the unit tests run.
-
- Test case: Small
- ================
- This is just a very small function; it is smaller than the hook code will be -
- so how does the library react?
-
-
- _small:
- xor eax, eax
- ret
-
-
- Test case: Branch
- =================
- Instead of the FASM code I'll show the disassembled version, so you can see the
- instruction lengths & offsets.
-
-
- 0026 | 48 83 E0 01 | and rax,1
- 002A | 74 17 | je test_cases.0043 --+
- 002C | 48 31 C0 | xor rax,rax |
- 002F | 90 | nop |
- 0030 | 90 | nop |
- ... |
- 0041 | 90 | nop |
- 0042 | 90 | nop |
- 0043 | C3 | ret <----------------+
-
-
- This function has a branch in the first 5 bytes. Hooking it detour-style isn't
- possible without fixing that branch in the trampoline. The NOP sled is just so
- the hooking engine can't cheat and just put the whole function into the
- trampoline. Instead the jump in the trampoline needs to be modified so it jumps
- back to the original destinations
-
- Test case: RIP relative
- =======================
- One of the new things in AMD64 is RIP relative addressing. I guess the reason
- to include it was to make it easier to generate PIC -- all references to data
- can now be made relative, instead of absolute. So it doesn't matter anymore
- where the program is loaded into memory and there's less need for the
- relocation table.
-
- A quick and dirty[1] test for this is re-implementing the well known C rand
- function.
-
-
- public _rip_relative
- _rip_relative:
- mov rax, qword[seed]
- mov ecx, 214013
- mul ecx
- add eax, 2531011
- mov [seed], eax
-
- shr eax, 16
- and eax, 0x7FFF
- ret
-
- seed dd 1
-
-
- The very first instruction uses rip relative addressing, thus it needs to be
- fixed in the trampoline.
-
- Test case: AVX & RDRAND
- =======================
-
- The AMD64 instruction set is extended with every CPU generation. Becayse the
- hooking engines need to know the instruction lengths and their side effects to
- properly apply their hooks, they need to keep up.
-
- The actual code in the test case is boring and doesn't matter. I'm sure there
- are disagreements on whether I've picked good candidates of "exotic" or new
- instructions, but those were the first that came to mind.
-
- Test case: loop and TailRec
- ===========================
-
- My hypothesis before starting this evaluation was that those two cases would
- make most hooking engines fail. Back in the good ol' days of x86 detour hooking
- didn't require any special thought because the prologue was exactly as big as
- the hook itself -- 5 bytes for `PUSH ESP; MOV EBP, ESP` and 5 bytes for `JMP +-
- 2GB`[2]. That isn't so easy for AMD64: a) the hook sometimes needs to be *way*
- bigger b) due to changes in the calling convention and the general architecture
- of AMD64 there just isn't a common prologue, used for almost all functions,
- anymore.
-
- Those by itself arn't a problem, since the hooking engines can fix all the
- instructions they would overwrite. However I hypothesized that only a few would
- check whether the function contained a loop that jumps back into the
- instructions that have been overwritten. Consider this:
-
- public _loop
- _loop:
- mov rax, rcx
- @loop_loop:
- mul rcx
- nop
- nop
- nop
- loop @loop_loop ; lol
- ret
-
- There's only 3 bytes that can be safely overwritten. Right after that is the
- destination of the jump backwards. This is a very simple (and kinda pointless)
- function so detecting that the loop might lead to problems shouldn't be a
- problem. But consider what happens with MHook (and all the others):
-
- _loop original:
-
- 008C | 48 89 C8 | mov rax,rcx
- 008F | 48 F7 E1 | mul rcx
- 0092 | 90 | nop
- 0093 | 90 | nop
- 0094 | 90 | nop
- 0095 | E2 F8 | loop test_cases.008F
- 0097 | C3 | ret
-
- _loop hooked:
-
- 008C | E9 0F 69 23 00 | jmp <MHook_Hooks::hookLoop>
- 0091 | E1 90 | loope test_cases.0023
- 0093 | 90 | nop
- 0094 | 90 | nop
- 0095 | E2 F8 | loop test_cases.008F
- 0097 | C3 | ret
-
- trampoline:
-
- 00007FFF7CD200C0 | 48 89 C8 | mov rax,rcx
- 00007FFF7CD200C3 | 48 F7 E1 | mul rcx
- 00007FFF7CD200C6 | E9 C7 96 DC FF | jmp test_cases.0092
-
- then executes:
-
- 0092 | 90 | nop
- 0093 | 90 | nop
- 0094 | 90 | nop
- 0095 | E2 F8 | loop test_cases.008F
-
- But that jumps back into the middle of the jump and thus executes:
-
- 008F | 23 00 | and eax,dword ptr ds:[rax]
- 0091 | E1 90 | loope test_cases.0023
-
- Which isn't right and will crash horribly.
-
- (Preliminary) Results
- =====================
-
- +----------+-----+------+------------+---+------+----+-------+
- | Name|Small|Branch|RIP Relative|AVX|RDRAND|Loop|TailRec|
- +----------+-----+------+------------+---+------+----+-------+
- | PolyHook| X | X | X | X | | | |
- | MinHook| X | X | X | | | | X |
- | MHook| | | X | | | | |
- +----------+-----+------+------------+---+------+----+-------+
-
- [1] This is one of the things that could easily be improved, but haven't been
- because I just couldn't motivate myself. Putting the data right after the func
- meant that a section containing code needed to be writable. Which is bad. Also
- I load the seed DWORD as a QWORD -- which only works because the upper half is
- then thrown away by the multiplication. It's shitty code is what I'm saying.
-
- In retrospect I should have used a jump table like a switch-case could be
- compiled into. That would be read only data. Oh well.
-
- [2] And Microsoft decided at some point to make it even easier for their code
- with the advent of hotpatching.
|