Browse Source

add my code and shit

master
aaaaaa aaaaaaa 7 years ago
parent
commit
c3db2d4d91
1 changed files with 142 additions and 1 deletions
  1. +142
    -1
      README.md

+ 142
- 1
README.md View File

@@ -152,6 +152,11 @@ The actual code in the test case is boring and doesn't matter. I'm sure there
are disagreements on whether I've picked good candidates of "exotic" or new
instructions, but those were the first that came to mind.

(It's also doubtful whether you'll ever encounter functions where the first
instructions are of this category, because most probably there's some setup
needed before, e.g. checking that adresses are aligned, initalizing loop
counters, yadda, yadda)

Test case: loop and TailRec
===========================

@@ -235,6 +240,133 @@ Which isn't right and will crash horribly.
| MHook| | | X | | | | |
+----------+-----+------+------------+---+------+----+-------+

As expected nothing could correctly hook the loop. In fact I had to comment out
those parts because even Catch2 couldn't recover from the crashes generated by
the botched hooks. Some hooking engines are a bit lacking in their support for
newer instruction sets, but a simple update of the dissassembler library should
fix that.

I was pleasantly suprised by MinHook, both the general AIP and because it
managed to build a trampoline that worked perfectly even for the tail
recursion case. I'd recommend it, even though it seems theres no chance that
the dissassembler will ever be updated.

Detecting tail recursive functions / loops into overwritten code
================================================================

Back in 2015 I wanted to write my own hooking engine which would be able to
hook ALL THE FUNCTIONS! And I did actually start to write it and then
abandoded it, before I got to the interesting part. However since then I had
the basic idea down:

1) Find out how long the function is
2) Analyze it, by checking whether some jump could jump into the overwritten
instructions
3) Somehow fix that

Fixing that code probably means putting the whole function in the trampoline,
by definition there is no space where to put the additional/longer instructions.

However I think that hooking engines should at least fail fast if they can't
hook that function and give the user the ability to handle that error at that
stage instead of waiting for unpredictable crashes. I'll post example code
[here](https://git.free-hack.com/wacked/x64hook) and outline the general
technique below.

(My x64hook hooking engine doesn't work. There's literally two interesting
functions in it, and I give pseudocode for them below)

Estimate the length of a function
---------------------------------

Note: This is an estimation of the function length. There's various ways to go
about to do it, one way would be to search pro- and epilogue. Which would fail
for all functions that -- for whatever reason -- don't have that. I'm sure this
way also isn't perfect, but maybe it could be used as another source of
information[5].

Over the years I've seen various attempts at estimating the function length.
One of the top hits for my google history is a question on stackoverflow
which[3] uses the same technique that I've seen in various malware strains -
checking byte for byte until the RET opcode is found. Which won't work if
either:

1) The `RET imm16` opcode is used, which is often the case for __stdcall funcs.
2) There are multiple returns
3) The function doesn't actually return with the RET instruction. For example
if a function A at its end calls another function B, with A and B sharing the
same parameters and either A or B not modifying the stack pointer it is
perfectly possible to just jump to function B. Exectution will continue in B,
which ends with a normal RET.
4) The value 0xC3 appears for some other reason in the function.

4) can be easily solved by using a length disassember engine and just checking
the actual instruction byte. 1) and 3) aren't that hard either, you'll just
need to check for some additional opcodes. What about 2)?

The key insight I had was why a function might have multiple returns -- because
it needed to do additional work in some cases. Which meant that there had to be
branching, to sometimes skip some instructions or get to them.

If there is a branch backwards it's a loop. But a branch forwards means that
the function extends at least up to there[4]. Or in pseudocode:

offsetOfInstr = 0
funcLen = 0
furthestJump = 0
while(can dissasemble next instruction)
{
offsetOfInstr += funcLen;

op = getOpcode(instruction);
if(is_jump(op))
{
off = get_jump_offset(instruction);
if(off > furthestJump)
furthestJump = off;
}

if(is_end_of_function(op, furthestJump, offsetOfInstr))
{
break;
}
}
bool is_end_of_function(opc, furthestJump, instrOffset)
{
if(opc == RET && furthestJump <= instrOffset)
return true;
else if(opc == UD_Ijmp)
{
if(destination is IMM || destination is register)
return true;
}

return false;
}


Detecting loops to the start of a function
------------------------------------------

firstJumpOffset = MAX_INT
foreach(instruction in function)
if(instruction is a jump)
jumpOffset = getOffset(instruction) // relative to function start
/* jumps to exactly the start of a function are fine, since that is
where our overwritten code starts. Thus it doesn't jump into the middle
of an instruction */
if(jumpOffset == 0)
continue
if(jumpOffset < firstJumpOffset)
firstJumpOffset = jumpOffset;

return firstJumpOffset < lengthNeededForHook
------------

[1] This is one of the things that could easily be improved, but haven't been
because I just couldn't motivate myself. Putting the data right after the func
meant that a section containing code needed to be writable. Which is bad. Also
@@ -245,4 +377,13 @@ In retrospect I should have used a jump table like a switch-case could be
compiled into. That would be read only data. Oh well.

[2] And Microsoft decided at some point to make it even easier for their code
with the advent of hotpatching.
with the advent of hotpatching.

[3] https://stackoverflow.com/questions/8705215/get-the-size-length-of-a-c-function

[4] With some caveats, e.g. one could assume that no function is longer than
512 bytes. And obviously keeping in mind point 3

[5] Another heuristic would be to check for the next slide of filler
instructions, such as INT3 or NOP. Some compilers align functions on 16byte
boundarys and fill the gaps with those

Loading…
Cancel
Save