Quantcast
Channel: windows deep internals
Viewing all articles
Browse latest Browse all 264

asm injection stub

$
0
0
Lets check what this stub should do being injected in some linux process via __malloc_hook/__free_hook (btw this implicitly means than you cannot use this dirty hack for processes linked with musl or uClibc - they just don't have those hooks)
  • bcs our stub can be called from two different hooks we should store somewhere via which entry point we was called
  • restore old hooks values
  • call dlopen/dlsym and then target function (and pass it address of injection stub for delayed munmap. No, you can't free those memory directly in your target function - try to guess why)
  • get right old hook and jump to it if it was installed or just return to code called __malloc_hook somewhere in libc

So I collected all parameters to do job in table dtab consisting from 6 pointers

  1. __malloc_hook address
  2. old value of __malloc_hook
  3. __free_hook address
  4. old value of __free_hook
  5. pointer to dlopen
  6. pointer to dlsym
after those table we also has couple of string constants for injected.so full path and function name. Also bcs we must setup 2 entry point I decided to put 1 byte with distance between first and second (to make injection logic more universal) right after dtab. Sounds easy, so lets check how this logic can be implemented on some still living processors (given that RIP alpha, sparc, hp-pa etc)

arm64

for some unknown reason they call it aarch64
Source, size 209 bytes with BTI c in prologues
arm64 has lots of really amazing features:
  • pre & post processing, like in stp x29, x30, [sp, -16]! at first sp decreased on 16 bytes and then 2 registers are pushed to stack at once
  • cbz/cbnz instructions can in one operation compare value with zero and make branch
  • support for PC-relative addressing and even better - if desired address located within 4Kb you can use just one instruction adr and this lead to code size reduction/better cache utilization
Unfortunately last feature almost ignored by GCC, bcs for starters it just can't place constants into .text section. I made very primitive patch against it but this is just begin of story. Lets check when GCC decides to use short adr form. Constraints "Usa"means logical AND of
In practice this means almost never
PS: Visual Studio can take advantage of short PC-relative addressing by placing all constants inside pool behind function

 

mips32

This port was inspired by this cool X post. I dislike mips asm bcs of it's strange out-of-order execution - you must hard think what instruction you could place after literally each j/jal/bXX opcode and this make my brain (fostered on z80 i386) to seethe. Also mips don't have direct access to PC, so I used old-school trick with jal and arithmetic on $ra register
Source, size 234 bytes


loongarch

I just couldn’t ignore it since couple of years ago I made IDA Pro processor module for loongarch. It suffers from strange opcodes for PC-relative addressing - like you have to use pair pcalau12i/addi even if desired address located within 12bit range. And it's better for you not to even know how it loads full 64-bit address
Source, size 242 bytes

Viewing all articles
Browse latest Browse all 264

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>