Quantcast
Channel: windows deep internals
Viewing all 265 articles
Browse latest View live

using FSM to recover struct fields offsets

$
0
0

In previous post I described declarative way to find non-exported data and functions using FSM. But often you also need to know offsets to some fields in structures - they can be changed in different versions of Windows. So let see if this can be done in the same declarative manner

Perhaps most safe way is to track registers contained arguments to some function (btw not necessary exported). So I added yet two states to FSM

  • ldrx register_index. Can have prefix stg N to remember this address
  • addx register_index. Can have prefix stg N to remember this address
Amazing but it`s all that we need to start recover offsets!

Lets see example - I wrote simple rules to extract some ETW related structures fields offsets. It starts with exported function EtwRegister contained couple of non-exported functions PsGetCurrentServerSiloGlobals (which you can use for example to extract address of PspHostSiloGlobals - I'll leave this as simple exercise for the reader) and EtwpRegisterProvider - it expects ETW_SILODRIVERSTATE as first parameter, so we can ldrx0 here and get ESERVERSILO_GLOBALS.EtwSiloState offset

Then process EtwpRegisterProvider - it contains calls to EtwpFindGuidEntryByGuid & EtwpAddGuidEntry and ExAcquirePushLockExclusiveEx - in x0 we also can get ETW_GUID_ENTRY.Lock offset

Finally process EtwpFindGuidEntryByGuid to extract ETW_GUID_ENTRY.Guid offset
Run on kernel 20251:
afsm.exe -a D:\src\armpatched\fsm\etw.fsm d:\work\kernel\w10\20251\arm\ntoskrnl.exe
 1 - 27A2F8
 2 - 360
 3 - 686490
 4 - 681558
 5 - 67D008
 6 - 198
 7 - 28


ESERVERSILO_GLOBALS.EtwSiloState was stored in index 2, ETW_GUID_ENTRY.Lock in 6, ETW_GUID_ENTRY.Guid in 7. And now the same rules apply for build 18362:

afsm.exe -a D:\src\armpatched\fsm\etw.fsm d:\work\kernel\w10\18362\arm\ntoskrnl.exe
 1 - 120BE8
 2 - 360
 3 - 5D42F8
 4 - 574DE0
 5 - 59F188
 6 - 180
 7 - 18

You can check in pdb that ETW_GUID_ENTRY.Lock really has offset 0x180 in this build and ETW_GUID_ENTRY.Guid 0x18

fsm rules for rpcrt4!GlobalRpcServer

$
0
0

I already described how you can extract address of GlobalRpcServer and offset to some RPC_SERVER_T fields. Lets do it for arm64 in declarative manner using FSM

Start again with I_RpcServerRegisterForwardFunction function - we can get address of RpcHasBeenInitialized (will be stored with index 1), GlobalRpcServer (with index 2) and RPC_SERVER_T.pRpcForwardFunction offset (with index 3):

section .data
func I_RpcServerRegisterForwardFunction
# 1 - RpcHasBeenInitialized
stg1 load
# 2 - GlobalRpcServer
stg2 load
# 3 - ForwardFunction offset
stg3 strx

Next we can get size of RPC_SERVER_T - from function InitializeRpcServer as argument to AllocWrapper. But InitializeRpcServer is surprisingly hard to find - it is not exported and called one time from InitializeServerDLL (which also non-exported). It using lots of unicode strings but unfortunately they all have common prefix "NT AUTHORITY" what makes them indistinguishable for signature 16 bytes. But you can notice that inside this function registering some RPC_SERVER_INTERFACE - so we can use its content as GUID: 

section .data
fsection .text
# size of RPC_SERVER_T - 1st arg to AllocWrapper, stored with index 4
stg4 movx0
# AllocWrapper
call
# store to GlobalRpcServer
gstore 2

Next in function you can see call to RtlInitializeCriticalSectionAndSpinCount expecting pointer to CRITICAL_SECTION as first argument (passed in register x0) - store this offset under index 5:

# x0 - offset to critical section, stored with index 5
stg5 addx0
call_imp RtlInitializeCriticalSectionAndSpinCount

And now we must insert loading of our pseudo-GUID to be able to find function InitializeRpcServer:

guid 60 00 00 00 80 BD A8 AF  8A 7D C9 11 BE F4 08 00
call .text

Also we can find offset to stop event - from exported function RpcMgmtStopServerListening, where this offset passed as first argument to SetEvent:

# stop event offset - from RpcMgmtStopServerListening
section .data
func RpcMgmtStopServerListening
gload 1
gload 2
# 7 - offset to stop event, 1st arg to SetEvent
stg7 ldrx0
call_imp SetEvent

Run our rules on couple of files:

afsm.exe -a D:\src\armpatched\fsm\tmp.fsm d:\work\kernel\w10\18363\arm\rpcrt4.dll D:\work\kernel\w10\rtm\2004\arm\rpcrt4.dll
[0] d:\work\kernel\w10\18363\arm\rpcrt4.dll: found at 0
 1 - 114D3C
 2 - 114D28
 3 - 178
 4 - 1F0
 5 - 60
 7 - 108
[1] D:\work\kernel\w10\rtm\2004\arm\rpcrt4.dll: found at 0
 1 - 10ED44
 2 - 10ED30
 3 - 178
 4 - 1F0
 5 - 60
 7 - 108

With only 3 simple rules we can extract:
  • address of RpcHasBeenInitialized - at index 1
  • address of GlobalRpcServer - at index 2
  • offset of RPC_SERVER_T.pRpcForwardFunction - at index 3
  • size of RPC_SERVER - at index 4
  • offset to some CRITICAL_SECTION in RPC_SERVER - at index 5
  • offset to stop event in RPC_SERVER - at index 7
This data is enough to select right version of RPC_SERVER struct (or at least say that this is some new and unknown version)

poorgcc: IDA Pro plugin to fix poor gcc code on arm64

$
0
0

Lets see what generates gcc for arm64 - for example gcc7.5 and linux kernel
Function do_sysinstr:

ADRP            X0, #__func__.48604@PAGE ; "arm64_show_signal"
ADD             X0, X0, #__func__.48604@PAGEOFF
ADRP            X3, #ctr_read_handler@PAGE
ADD             X0, X0, #0x218
ADD             X3, X3, #ctr_read_handler@PAGEOFF

Wtf happened here? Instead of loading x0 with address of sys64_hooks we have two consecutive loads and no value x0 used between. You can peek some random functions - this is very common pattern, I personally think this is bug in gcc arm64 codegen. Anyway, it does not allow to see right xrefs so I wrote simple plugin for IDA Pro to fix this

Plugin just try to find instructions "add add reg, reg, imm" without data xref and backtrack if this register was loaded somewhere above - sure code is not sample of elegance. You can add to plugins.cfg string like this

process_all_poor_gcc_functions    poorgcc64     0      1

to process all functions

Some results - after applying plugin to function do_sysinstr code looks like:

ADRP            X0, #__func__.48604@PAGE ; "arm64_show_signal"
ADD             X0, X0, #__func__.48604@PAGEOFF
ADRP            X3, #ctr_read_handler@PAGE
ADD             X0, X0, #0x218 ; FFFFFFC010C116C8
ADD             X3, X3, #ctr_read_handler@PAGEOFF


FFFFFFC010C116C8 is address of sys64_hooks and now it has right xref

ecdsa in driver

$
0
0

Lets assume that we have buggy and dangerous driver (which "rely on many unexported functions and select them via pattern scans which are regularly revalidated against windows insider builds", he-he). Sure we want restrict access to it, for example like ProcessHacker do

Unfortunately the latter uses CNG and cannot work on xp/w2k3. So I made fork of libecc to use this library with WDK7. Test driver and client also included

How to build user-mode part

I commited VS2017 project files for library, ec_utils and test client - they located in directory vs.
Next you must sign your client:

Generate you keys (constants BRAINPOOLP512R1, ECRDSA and SHA3_512 hardcoded in driver - sure you can use what you want):
ec_utils.exe gen_keys BRAINPOOLP512R1 ECRDSA mykeypair

and sign your client 
ec_utils.exe sign BRAINPOOLP512R1 ECRDSA SHA3_512 testclnt.exe mykeypair_private_key.bin testclnt.sig

now copy file mykeypair_public_key.h to directory drv
Also you need convert file testclnt.sig to 1.inc to driver source code - I am too lazy to read signatures from registry so they hardcoded in driver body

How to build driver

Launch right "Build Environment" from WDK7, Makefile for library located in directory src and Makefile for driver in directory drv. I hope you know what to do with them

Run

You will need admin privileges, at first install driver
testclnt.exe full_path2_ecdsadrv.sys
and just run
testclnt.exe

If you were careful enough with the signatures you can see something like:
IOCTL_TEST_IOCTL return 1

This means that driver checked EC DSA of your testclnt.exe and now agree to work with it. Sure you can have several trusted clients - just change ALLOWED_CLIENTS in vrfy.c and init each client with right signature

And finally when you have enough playing you can uninstall driver:

testclnt.exe -u

codewars heisenbug

$
0
0

 I got following crash when tried to solve some trivial task:


UndefinedBehaviorSanitizer:DEADLYSIGNAL ==1==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000020 (pc 0x0000004271a4 bp 0x7ffcf51d9a28 sp 0x7ffcf51d9070 T1) ==1==The signal is caused by a READ memory access. ==1==Hint: address points to the zero page. ==1==WARNING: invalid path to external symbolizer! ==1==WARNING: Failed to use and restart external symbolizer! #0 0x4271a3 (/workspace/test+0x4271a3) #1 0x4276d0 (/workspace/test+0x4276d0) #2 0x4273f0 (/workspace/test+0x4273f0) #3 0x427eed (/workspace/test+0x427eed) #4 0x4282b9 (/workspace/test+0x4282b9) #5 0x42abe3 (/workspace/test+0x42abe3) #6 0x4295ce (/workspace/test+0x4295ce) #7 0x429129 (/workspace/test+0x429129) #8 0x428d1b (/workspace/test+0x428d1b) #9 0x43b625 (/workspace/test+0x43b625) #10 0x42810d (/workspace/test+0x42810d) #11 0x7fdeedfdabf6 (/lib/x86_64-linux-gnu/libc.so.6+0x21bf6) #12 0x405339 (/workspace/test+0x405339)

ok, nothing special, just dereferencing zero ptr. But where? code looks like:
double eval(const std::shared_ptr<ASTNode> &tree) {
    if ( !tree )
      return 0.0;
    switch(tree->token.type) {

At least we have address of crash - lets dump first 0x40 bytes of eval function:

   unsigned char *c = (unsigned char *)&eval;
   for ( int i = 0; i < 0x40; i++ )
     printf("%2.2X ", *(c + i));

And then put them in 64bit disasm:

000000000000000d 55               push rbp
000000000000000e 4157             push r15
0000000000000010 4156             push r14
0000000000000012 53               push rbx
0000000000000013 4883ec18         sub rsp, 0x18
0000000000000017 488b37           mov rsi, [rdi]
000000000000001a 660f57c0         xorpd xmm0, xmm0
000000000000001e 4885f6           test rsi, rsi
0000000000000021 0f849d020000     jz dword 0x2c4
0000000000000027 8b4620           mov eax, [rsi+0x20] ; crash is here

WHAT? how rsi can be zero if it passed check test rsi, rsi? Is it buggy qemu, docker or some speculative read-ahead or what is it? I doubt if I wish continue use this service

dumper of linux kernel notification chains

$
0
0

There seems to be one little-known thing in linux kernel - notification chains. So they have literal analogue of PsSetLoadImageNotifyRoutine - function register_module_notifier. And similarly they don't have a function to enumerate registered notifications - I don`t know why. Maybe they were bitten by Microsoft. Or maybe I want too much from people whose even "The Linux Kernel Module Programming Guide" contains an error in the code example. Anyway I decided to write my own (btw the last time I wrote drivers for Linux was something around 20 years ago)

How to run

git clone https://github.com/redplait/lkcd.git
cd lkcd
make
sudo insmod ./lkcd.ko
cd test
make
sudo ./dtest

Sample of output (from fresh Ubuntu):
backlight_notifier: 0xffffffff8bd86ba0
backlight_notifier cnt: 1
 0xffffffffc00c8260 - video
reboot_notifier_list: 0xffffffff8b663140
reboot_notifier_list cnt: 6
 0xffffffff8a8e0860 - kernel
 0xffffffff8a074f80 - kernel
 0xffffffff8a929ea0 - kernel
 0xffffffff8a633410 - kernel
 0xffffffff8a76e1e0 - kernel
 0xffffffff8a213350 - kernel
module_notify_list: 0xffffffff8b6c3420
module_notify_list cnt: 10
 0xffffffff8a22b4b0 - kernel
 0xffffffff8a1cec70 - kernel
 0xffffffff8a1b9860 - kernel
 0xffffffff8a1acda0 - kernel
 0xffffffff8a17f830 - kernel
 0xffffffff8a1a11c0 - kernel
 0xffffffff8a1cc280 - kernel
 0xffffffff8a191ad0 - kernel
 0xffffffff8a11e500 - kernel
 0xffffffff8a5a0d60 - kernel
pm_chain_head: 0xffffffff8b66a2a0
pm_chain_head cnt: 9
 0xffffffff8a11fd30 - kernel
 0xffffffff8a09b490 - kernel
 0xffffffff8a769a80 - kernel
 0xffffffff8a8cb230 - kernel
 0xffffffff8a18ae70 - kernel
 0xffffffff8a76e650 - kernel
 0xffffffffc030cfb0 - vmwgfx
 0xffffffffc05286b0 - intel_rapl_common
 0xffffffff8ab13bd0 - kernel

PS: if you know more canonical way to find range of address for kernel - drop me a comment

functions pointers in linux kernel data sections

$
0
0

I wrote simple program to estimate size of problem. Yes, I know about CFI but it seems that even on kernel 5.11 on fresh Ubuntu this mechanism is not implemented and indirect calls looks like:

  mov     rax, cs:XXX
  call    __x86_indirect_thunk_rax

__x86_indirect_thunk_rax proc near: 
  jmp     rax

First approach is just to scan .data section - you can do this running

./lkmem path-to-unpacked-kernel path-to-System.map

Some results:
  • arm64 5.11.0: 9893
  • x64 5.8-53: 10698
  • x64 5.11.0: 13414
  • x64 4.18: 16224
Ok, how about not yet inited pointers (or pointers in .bss section)? We need use disassembler - just disasm all functions in .text and find indirect calls and calls to __x86_indirect_thunk_XXX. Results (with -d option):
  • x64 4.18: +42
  • x64 5.8-53: +52
  • x64 5.11.0: +45
and with .bss section (option -b):
  • x64 4.18: +99
  • x64 5.8-53: +120
  • x64 5.11.0: +109

arm64 disasm for linux kernel

$
0
0

I added today disassembler for arm64 linux kernel to search pointers. It turned out to be surprisingly difficult to do for several reasons (disasm for x64 is only 383 LOC vs 618 for arm64)

One of them is poor code produced by some gcc versions

But the main problem is arm64 opcodes. Lets see simple indirect call:
  ADRP            X27, #mh_filter@PAGE
  CMP             W22, #0x3A ; ':'
  B.EQ            loc_FFFFFFC010CC7140
  CMP             W22, #0x87
  B.NE            loc_FFFFFFC010CC7188
  LDR             X2, [X27,#mh_filter@PAGEOFF]
  CBZ             X2, loc_FFFFFFC010CC7188
  MOV             X1, skb
  MOV             X0, X28
  BLR             X2
    

compare this with code to call list of funcs from tracepoints:
  ADRP            __data, #__tracepoint_cpu_idle@PAGE
  ADD             X0, X0, #__tracepoint_cpu_idle@PAGEOFF
  MOV             X29, SP
  STR             X19, [SP,#var_s10]
  LDR             X19, [X0,#(__tracepoint_powernv_throttle.funcs - 0xFFFFFFC011A562C0)]
 ...
loc_FFFFFFC01011FC60:
  LDR             X4, [X19]
  MOV             W3, W20
  LDR             X0, [X19,#8]
  MOV             X2, X21
  MOV             W1, W22
  BLR             X4
  LDR             X0, [X19,#0x18]!
  CBNZ            X0, loc_FFFFFFC01011FC60

In second case register X4 was loaded from X19, which in turn was loaded from some memory, so I need to track how many times content of register was loaded

Anyway results is +34 newly discovered functions pointers


linux kernel tracing

$
0
0

It`s hard to believe but linux kernel has almost exact copy of windows ETW - event tracing. It is just as difficult to make it work, it is poorly documented, very complex and fragile. And yes, as you can guess - it also can`t show who and which parts of it in use. So I wrote some code to dump registered funcs in tracepoints and to check file ops for files in /sys/kernel/tracing/events

Lets start with tracepoints. As you see this structure has strange looked list of functions in field funcs, and calling happens in functions like event_triggers_call. How we can find this tracepoints? Well,  they stored in trace_event_call->tp and array of pointers to trace_event_call located between symbols __start_ftrace_events__stop_ftrace_events. Unfortunately all this treasures located in discardable section .init.data. But because they were all declared in the same manner we can find them by name - all symbols with prefix __tracepoint_ is what we need. So some examples (you can run lkmem -c -t vmlinux system.map to get this):

 __tracepoint_sys_enter at 0xffffffff8b82e340: enabled 0 cnt 0
  regfunc 0xffffffff8a192330 - kernel!syscall_regfunc
  unregfunc 0xffffffff8a1923f0 - kernel!syscall_unregfunc


Well, no clients right now - cnt 0

Next about /sys/kernel/tracing/events files (this is perverted inhuman interface to manage trace events). I just dumping file->f_path.dentry->d_inode->i_fop for each such file. Sample of output (you can achieve this with lkmem -s vmlinux system.map path_to_some_sys_kernel_tracing_file):


res /sys/kernel/tracing/events/alarmtimer/alarmtimer_cancel/enable: (nil)
 inode: 0xffffa07f9ae86be0
 s_op: 0xffffffff8b0a3000 - kernel!tracefs_super_operations
 inode->i_fop: 0xffffffff8b06b4c0 - kernel!ftrace_enable_fops
res /sys/kernel/tracing/events/alarmtimer/alarmtimer_cancel/filter: (nil)
 inode: 0xffffa07f9ae877c0
 s_op: 0xffffffff8b0a3000 - kernel!tracefs_super_operations
 inode->i_fop: 0xffffffff8b06b160 - kernel!ftrace_event_filter_fops
res /sys/kernel/tracing/events/alarmtimer/alarmtimer_cancel/format: (nil)
 inode: 0xffffa07f9ae86980
 s_op: 0xffffffff8b0a3000 - kernel!tracefs_super_operations
 inode->i_fop: 0xffffffff8b06b3a0 - kernel!ftrace_event_format_fops

Why it`s very important to check this table? Lets assume that some bad guy want to hide something from you. One way this can be done is replace pointer to file_operations to your own, now if somebody want enable trace event with
 echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
your version of event_enable_write will be called. And in the opposite direction when somebody wants to check is tracing enabled - calling your version of event_enable_read. For obvious reasons, I will not provide code examples for such patches

linux-kernel per-cpu vars

$
0
0

It`s hard to believe but linux has degraded version of KPCR on windows - so called "per-cpu variables". This is some isolated memory assigned to CPU (stored in gs segment register on x64 and in MSR register c13 on arm64) and can contains some interesting fields. Why this is important to know offsets some of this variables? Well, I suspect that linux kernel contains much more code for espionage than windows (for example trace events, tracepoints, kprobes, usb_mon_register etc etc). One of such code is function user_return_notifier_register with which you can register your own notifications. Unfortunately this list of notifications stored in per-cpu variable return_notifier_list

And as usually there is no some include file with definition of all of this per-cpu fields. Moreover this offsets depend from config for kernel building and differ in each build. Sounds like nightmare, reason to turn off the computer and go drink vodka looking at the autumn rain.

Or not? Lets see in disasm some functions using this var - like fire_user_return_notifiers:
fire_user_return_notifiers proc near
 call    __fentry__ ; another entry for spy code
 mov     rax, offset unk_29450
 add     rax, gs:this_cpu_off ; .data..percpu:0000000000011368
 mov     rdi, [rax]

In this build return_notifier_list happens to have offset 0x29450 and this_cpu_off 0x11368. 
Well, we can use disasm to get offsets to both return_notifier_list & this_cpu_off and then write code like:
; rdi - this_cpu_off
; rsi - offset
get_this_gs:
mov rax, [gs:rdi]
add rax, rsi
ret

Patch on github to extract this_cpu_off & return_notifier_list with some disasm magic

linux kernel kprobes

$
0
0

without a doubt most crazy and insane spying mechanism in linux kernel is krobes

  1. It`s expensive - each time when int3 occurred typical call stack looks like:
    xen_asm_exc_int3
    asm_exc_int3
    irq_entries_start
    exc_int3
    do_int3
    kprobe_int3_handler
  2. It makes working with kdbg (which itself is too far away from windbg) like nightmare - function do_int3 first calls kgdb_ll_trap
  3. There is no mechanism to predict which functions cannot be kprobed. Let assume that your handler uses simple printk - so you can`t set kprobe on whole graph of functions called from printk (like vprintk_func, vprintk_default, vprintk_emit, __msecs_to_jiffies, arch_touch_nmi_watchdog, touch_softlockup_watchdog, __printk_safe_enter, _raw_spin_lock, vprintk_store, vscnprintf, cont_flush etc etc) and as far I know there is no way to even find them all
  4. Sure you have /sys/kernel/debug/kprobes/list file so you can see which functions was hooked. But there is no way to know by whom
So I wrote dumper of installed kprobes. Sample of output:

sudo ./lkmem -k -c ~/krnl/curr ~/krnl/System.map-5.11.0-34-generic
kprobes[47]: 1
 kprobe at 0xffffffffc0605080 flags 8
  addr: 0xffffffffa4a9f040 - kernel!__do_sys_fork
  pre_handler: 0xffffffffc0603548 - lkcd
  post_handler: 0xffffffffc0603526 - lkcd

linux kernel uprobes

$
0
0

Lets consider another spying mechanism in linux kernel - uprobes. They also insert int3 but this time in user-mode and can be used for example to steal TLS traffic. I made simple code to set up uprobe for /usr/bin/ls on PLT thunk getenv:

objdump -d /usr/bin/ls
...
0000000000004710 <getenv@plt>:
    4710:f3 0f 1e fa          endbr64 
    4714:f2 ff 25 5d e5 01 00 bnd jmpq *0x1e55d(%rip)        # 22c78 <getenv@GLIBC_2.2.5>
    471b:0f 1f 44 00 00       nopl   0x0(%rax,%rax,1)

now run ls
ls -i /usr/bin/ls
1043126 /usr/bin/ls 
dmesg | tail
[258600.533089] uprobe ret_handler is executed, ip = 55EAECA62B54
[258600.533090] uprobe handler in PID 43831 executed, ip = 55eaeca56710
[258600.533093] uprobe ret_handler is executed, ip = 55EAECA62B6C
[258600.533095] uprobe handler in PID 43831 executed, ip = 55eaeca56710
[258600.533098] uprobe ret_handler is executed, ip = 55EAECA5861C
[258600.533099] uprobe handler in PID 43831 executed, ip = 55eaeca56710
[258600.533102] uprobe ret_handler is executed, ip = 55EAECA57F60
[258600.533111] uprobe handler in PID 43831 executed, ip = 55eaeca56710
[258600.533114] uprobe ret_handler is executed, ip = 55EAECA57A77

And you can`t see which uprobes are installed - file /sys/kernel/debug/tracing/uprobe_events is empty. NSA can hide their anal catheters even in opened sources, yeah. So I wrote code to dump all uprobes (stored in uprobes_tree) and consumers of each uprobe
sudo ./lkmem -k -c ~/krnl/curr ~/krnl/System.map-5.11.0-34-generic
uprobes: 1
[0] addr 0xffffa008c309bc00 inode 0xffffa008c12d61a0 ino 1043126 clnts 1 offset 4710 flags 0 
 consumer[0] at 0xffffffffc0605100
   handler: 0xffffffffc0603b13 - lkcd
   ret_handler: 0xffffffffc0603af3 - lkcd

There is one problem - you can`t get filename from inode, only i_ino. So you can then use find -inum ino to find on which file uprobe was installed

filesystem notifications in linux kernel

$
0
0

disclaimer

Filesystems are the most complex part of any OS. I am not a specialist in linux filesystems and even don`t commit the code to linux kernel. So all information here cannot be considered reliable, code has tons of bugs and can damage your machine and ruin the rest of your life

Usermode notifications

linux has 3 mechanisms for passing filesystem notification to user-mode:

  1. dnotify
  2. inotify
  3. new-fashioned fanotify
all they projected to used-mode as file, so you can use lsof (or even just something like "ls /proc/*/fdinfo/* | xargs grep notify") to find then and what processes do they belong to. Unfortunately (as usually) this information is not enough. Let see for example function fanotify_fdinfo. We can notice that there are 3 possible source of notifications:
  1. just simple inode - for them dumped inode->i_ino & superblock s_dev - I don`t know how you can in usermode find mountpoint for this superblock
  2. mount point (btw struct mount even not described in linux/include). At least knowing mnt_id you can find name in /proc/pid/mountinfo file
  3. superblock - s_dev again dumping
Can you have real-time notifications about setting new xxnotify? Yes, via security_path_notify

Kernel mode notifications
Can you have the same functionality provided by xxnotify in kernel mode? Definitely yes - kernel audit uses it. I could not find any sample code to do this in your own driver so I wrote one. This is not very complex (although function fsnotify_destroy_group is not exported so you need some sort of kallsyms lookup). You can add to tracked_inode everything you want - like full filename, stat etc

And now the main question
Can you find all sources of filesystem notification?
Volatility cannot
I am not aware of any tools to do this so I wrote one by myself. Let`s think how we can implement this:
  1. we need enum superblocks first. Function iterate_supers is not exported (who can it stop?)
  2. then you can enum all mount points (mount_lock is not exported too)
  3. and enum inodes in current superblock and then enum installed fsnotify_marks (btw as you can guess functions fsnotify_first_mark/fsnotify_next_mark not exported too. My inner paranoia says that there is too many deliberately hidden functions)
Sample of output:

sudo ./lkmem -F -c ~/krnl/curr ~/krnl/System.map-5.11.0-34-generic
...
superblock[24] at 0xffff8a044be2a000 dev 8388610 flags 70018000 inodes 15923 sda2 mnt_count 43 root 0xffff8a0448f0ecc0 
 s_type: 0xffffffffa580eaa0 - kernel!ext4_fs_type
 s_op: 0xffffffffa5050080 - kernel!ext4_sops
 dq_op: 0xffffffffa50501c0 - kernel!ext4_quota_operations
 s_qcop: 0xffffffffa5050160 - kernel!ext4_qctl_operations
 s_export_op: 0xffffffffa5050020 - kernel!ext4_export_ops
 mnt[0] 0xffff8a04420fec80 mark_cnt 0 mnt_id 28 / rw,relatime shared:1 - ext4 /dev/sda2 rw,errors=remount-ro
  inode[1873] 0xffff8a0365f1e1a0 i_no 398369 i_flags 1000 FILE
    i_fsnotify_mask: 1FFE i_fsnotify_marks 0xffff8a036edb5de0 count 1
    fsnotify[0] 0xffff8a036ccb8720 mask 1FFE ignored_mask 0 flags 6
     group: 0xffff8a036ec09500
      ops: 0xffffffffc08f0160 - lkntfy


As you can see we successfully found inode on which my test notification was installed from driver,  event mask (0x1ffe - as in test-case), superblock and mount point - in this case this is root

P.S.: I don`t know why some super-blocks have huge amount of mount points (43 in the above sample). It is quite possible that this is another memory leak 

PoC to hide kprobes list

$
0
0

as you may know list of kprobes has mapping on /sys in file /sys/kernel/debug/kprobes/list. And now when I have working filesystem notifications it would be extremely tempting try to make hiding content of this file. Let`s see what this inode contains:


sudo ./lkmem -s -c ~/krnl/curr ~/krnl/System.map-5.11.0-34- generic /sys/kernel/debug/kprobes/list 
res /sys/kernel/debug/kprobes/list: (nil)
 inode: 0xffff8a0448d1ae40
 s_op: 0xffffffffa5067f80 - kernel!debugfs_super_operations
 inode->i_fop: 0xffffffffa506b000 - kernel!debugfs_full_proxy_file_operations
 debugfs_real_fops: 0xffffffffa5028ce0 - kernel!kprobes_fops
 private_data: 0xffffffffa5028e00 - kernel!kprobes_sops

kprobes_sops is just struct seq_operations and the function we need is show. So idea is simple
  • set notification for file /sys/kernel/debug/kprobes/list
  • in fsnotify_handle_event callback check inode and mask
  • if this is first opening of this file - patch kprobes_sops->show to our own function (be cautious with WP in cr0)
  • if this is last closing of this file - return original handler to kprobes_sops->show
  • also return original handler when driver is unloading
You may ask - why is it so difficult? It`s much easier just to patch kprobes_sops->show, right? The answer is that you minimize the risk of being discovered when patching only for some short period 

what linux hiding

$
0
0
disclaimer
there is no doubt that the list below is incomplete, inaccurate etc - it`s just what very average programmer can find during two month of browsing linux source code

observability criteria
what I mean under "hiding"? It means that
  • no kernel API to enumerate some structure
  • no real-time notifications about setting some hook
  • no mapping on /proc or /sys (however this method is not reliable)
  • no 3rd party tools to show this. As an example I chose volatility - just bcs I readed their folio"The Art of Memory Forensics"
So you unable to see them

notification chains
very ironic that they have API like register_XXX_notifier/unregister_XXX_notifier and there is no function like enum_XXX_notifier
no mapping on /proc or /sys
volatility checks only very limited set - vt_notifier_list & keyboard_notifier_list

tracepoints
no API to enum clients
no notification about turning on some tracepoint
has mapping to /sys/kernel/tracing/events but can`t show clients of some tracepoint
volatility - no

kprobes
no API to enum consumers of some installed KPROBE
no notification about installing new kprobe. This is an extremely sad fact - for example tools like LKRG don`t knows that some memory was patched
has mapping to /sys/kernel/debug/kprobes/
volatility - no

uprobes
no API to enum consumers of some installed UPROBE
no notification about installing new uprobe
has mapping to /sys/kernel/debug/tracing/uprobe_events. Most crazy thing is that uprobes installed from kernel not shown
volatility - no

filesystem notifications
no API to enum all installed marks
for usermode events has notification via security_path_notify, for kernelmode - absolutely not
has very limited mapping to /proc/*/fdinfo/*. Again marks installed from kernel not shown
volatility - no

security hooks in linux kernel

$
0
0

This mechanism was inspired by NSA. As described all hooks stored in huge struct security_hooks_list, but it`s format is different in each version. We can determine which list belongs to what hook with disasm magic. Lets see function that calls security hooks - for example security_path_chown:

.text:FFFFFFC010496448 security_path_chown        ; CODE XREF: chown_common+104↑p
.text:FFFFFFC010496448   STP             X29, X30, [SP,#-0x18+var_18]!
.text:FFFFFFC01049644C   MOV             X29, SP
.text:FFFFFFC010496450   STP             X20, X21, [SP,#0x18+var_s0]
.text:FFFFFFC010496454   STR             X22, [SP,#0x18+var_s10]
.text:FFFFFFC010496458   MOV             X20, path
.text:FFFFFFC01049645C   MOV             W21, W1
.text:FFFFFFC010496460   MOV             path, X30
.text:FFFFFFC010496464   MOV             W22, W2
.text:FFFFFFC010496468 loc_FFFFFFC010496468  ; DATA XREF: .init.data:FFFFFFC0111474C0↓o
.text:FFFFFFC010496468   BL              _mcount
.text:FFFFFFC01049646C   LDR             X0, [path,#8]
.text:FFFFFFC010496470   LDR             X0, [X0,#0x30]
.text:FFFFFFC010496474   LDR             W0, [X0,#0xC]
.text:FFFFFFC010496478   TBNZ            W0, #9, loc_FFFFFFC0104964B4
.text:FFFFFFC01049647C   ADRP            X0, #security_hook_heads_0.path_chown@PAGE
.text:FFFFFFC010496480   STR             X19, [X29,#0x18+var_8]
.text:FFFFFFC010496484   LDR             X19, [X0,#security_hook_heads_0.path_chown@PAGEOFF]


In disasm we just search for first reference to memory near address of security_hook_heads. Some results:
sudo ./lkmem -d -c -S ~/krnl/curr ~/krnl/System.map-5.11.0-37-generic
ptrace_access_check: 3
 0xffffffff964ae400 - kernel!cap_ptrace_access_check
 0xffffffff96515a40 - kernel!yama_ptrace_access_check
 0xffffffff96506770 - kernel!apparmor_ptrace_access_check
ptrace_traceme: 3
 0xffffffff964ae380 - kernel!cap_ptrace_traceme
 0xffffffff965159a0 - kernel!yama_ptrace_traceme
 0xffffffff965065e0 - kernel!apparmor_ptrace_traceme
capget: 2
 0xffffffff964ad960 - kernel!cap_capget
 0xffffffff96505a90 - kernel!apparmor_capget
capset: 1
 0xffffffff964ae490 - kernel!cap_capset
capable: 2
 0xffffffff964ada50 - kernel!cap_capable
 0xffffffff96505780 - kernel!apparmor_capable
bprm_creds_for_exec: 1
 0xffffffff964fcc60 - kernel!apparmor_bprm_creds_for_exec
bprm_creds_from_file: 1
 0xffffffff964ae8e0 - kernel!cap_bprm_creds_from_file
bprm_committing_creds: 1
 0xffffffff96504bc0 - kernel!apparmor_bprm_committing_creds
bprm_committed_creds: 1
 0xffffffff965054d0 - kernel!apparmor_bprm_committed_creds
sb_mount: 1
 0xffffffff96506ab0 - kernel!apparmor_sb_mount
sb_umount: 1
 0xffffffff96505c10 - kernel!apparmor_sb_umount
sb_pivotroot: 1
 0xffffffff965070d0 - kernel!apparmor_sb_pivotroot
path_unlink: 1
 0xffffffff965065a0 - kernel!apparmor_path_unlink
path_mkdir: 1
 0xffffffff965064c0 - kernel!apparmor_path_mkdir
path_rmdir: 1
 0xffffffff965065c0 - kernel!apparmor_path_rmdir
path_mknod: 1
 0xffffffff965064f0 - kernel!apparmor_path_mknod
path_truncate: 1
 0xffffffff965063b0 - kernel!apparmor_path_truncate
path_symlink: 1
 0xffffffff96506490 - kernel!apparmor_path_symlink
path_link: 1
 0xffffffff96508060 - kernel!apparmor_path_link
path_rename: 1
 0xffffffff96508210 - kernel!apparmor_path_rename
path_chmod: 1
 0xffffffff965063f0 - kernel!apparmor_path_chmod
path_chown: 1
 0xffffffff965063d0 - kernel!apparmor_path_chown
inode_getattr: 1
 0xffffffff96506390 - kernel!apparmor_inode_getattr
inode_need_killpriv: 1
 0xffffffff964ad990 - kernel!cap_inode_need_killpriv
inode_killpriv: 1
 0xffffffff964ad9c0 - kernel!cap_inode_killpriv
inode_getsecurity: 1
 0xffffffff964ae010 - kernel!cap_inode_getsecurity
file_permission: 1
 0xffffffff96506120 - kernel!apparmor_file_permission
mmap_addr: 1
 0xffffffff964ade10 - kernel!cap_mmap_addr
mmap_file: 2
 0xffffffff964ad930 - kernel!cap_mmap_file
 0xffffffff965060e0 - kernel!apparmor_mmap_file
file_mprotect: 1
 0xffffffff96506090 - kernel!apparmor_file_mprotect
file_lock: 1
 0xffffffff96506020 - kernel!apparmor_file_lock
file_receive: 1
 0xffffffff96506140 - kernel!apparmor_file_receive
file_open: 1
 0xffffffff965078c0 - kernel!apparmor_file_open
task_alloc: 1
 0xffffffff96505110 - kernel!apparmor_task_alloc
task_free: 2
 0xffffffff96515320 - kernel!yama_task_free
 0xffffffff965056c0 - kernel!apparmor_task_free
cred_alloc_blank: 1
 0xffffffff965046b0 - kernel!apparmor_cred_alloc_blank
cred_free: 1
 0xffffffff965055c0 - kernel!apparmor_cred_free
task_fix_setuid: 1
 0xffffffff964ade60 - kernel!cap_task_fix_setuid
task_getsecid: 1
 0xffffffff96505570 - kernel!apparmor_task_getsecid
task_setnice: 1
 0xffffffff964ae370 - kernel!cap_task_setnice
task_setioprio: 1
 0xffffffff964ae360 - kernel!cap_task_setioprio
task_setrlimit: 1
 0xffffffff96505d50 - kernel!apparmor_task_setrlimit
task_setscheduler: 1
 0xffffffff964ae350 - kernel!cap_task_setscheduler
task_kill: 1
 0xffffffff96507b40 - kernel!apparmor_task_kill
task_prctl: 2
 0xffffffff964adb10 - kernel!cap_task_prctl
 0xffffffff965155d0 - kernel!yama_task_prctl
setprocattr: 1
 0xffffffff96508c50 - kernel!apparmor_setprocattr
secctx_to_secid: 1
 0xffffffff96509a70 - kernel!apparmor_secctx_to_secid
release_secctx: 1
 0xffffffff96509ac0 - kernel!apparmor_release_secctx
unix_stream_connect: 1
 0xffffffff96507260 - kernel!apparmor_unix_stream_connect
unix_may_send: 1
 0xffffffff96506900 - kernel!apparmor_unix_may_send
socket_create: 1
 0xffffffff96507e50 - kernel!apparmor_socket_create
socket_post_create: 1
 0xffffffff96508490 - kernel!apparmor_socket_post_create
socket_bind: 1
 0xffffffff96504e10 - kernel!apparmor_socket_bind
socket_connect: 1
 0xffffffff96504dd0 - kernel!apparmor_socket_connect
socket_listen: 1
 0xffffffff96504da0 - kernel!apparmor_socket_listen
socket_accept: 1
 0xffffffff96504d70 - kernel!apparmor_socket_accept
socket_sendmsg: 1
 0xffffffff96505070 - kernel!apparmor_socket_sendmsg
socket_recvmsg: 1
 0xffffffff96504d20 - kernel!apparmor_socket_recvmsg
socket_getsockname: 1
 0xffffffff96504cb0 - kernel!apparmor_socket_getsockname
socket_getpeername: 1
 0xffffffff96504c90 - kernel!apparmor_socket_getpeername
socket_getsockopt: 1
 0xffffffff965050c0 - kernel!apparmor_socket_getsockopt
socket_setsockopt: 1
 0xffffffff96504cd0 - kernel!apparmor_socket_setsockopt
socket_shutdown: 1
 0xffffffff96504c70 - kernel!apparmor_socket_shutdown
socket_getpeersec_stream: 1
 0xffffffff96506d00 - kernel!apparmor_socket_getpeersec_stream
sock_graft: 1
 0xffffffff96505200 - kernel!apparmor_sock_graft
inet_conn_request: 1
 0xffffffff96504b00 - kernel!apparmor_inet_conn_request
audit_rule_init: 1
 0xffffffff964f60a0 - kernel!aa_audit_rule_init
audit_rule_known: 1
 0xffffffff964f6150 - kernel!aa_audit_rule_known
audit_rule_match: 1
 0xffffffff964f6190 - kernel!aa_audit_rule_match
audit_rule_free: 1
 0xffffffff964f6040 - kernel!aa_audit_rule_free
locked_down: 1
 0xffffffff96516700 - kernel!lockdown_is_locked_down
settime64: 1
 0xffffffff964ad940 - kernel!cap_settime
vm_enough_memory_mm: 1
 0xffffffff964adad0 - kernel!cap_vm_enough_memory
file_alloc: 1
 0xffffffff965076d0 - kernel!apparmor_file_alloc_security
prepare_creds: 1
 0xffffffff96505330 - kernel!apparmor_cred_prepare
sock_rcv_skb: 1
 0xffffffff96504b50 - kernel!apparmor_socket_sock_rcv_skb

needless to say that you can`t get this info with standard system tools

BPF iterators

$
0
0

Sure I could not get past the hype topic of BPF (overvalued mechanism to allow you just run your buggy code in kernel with low performance and lots of overhead). For access of some kernel data they add so called iterators - and maybe you even can write your own and register it with bpf_iter_reg_target (spoiler: you can`t, bcs this function is not exported. Welcome to wonderful world of open-source with unexplained and unreasonable restrictions). I was curious what BPF iterators are in the system - they stored iterators in list targets synchronized with mutex targets_mutex. It would seem what could go wrong? 

grep " targets" System.map-5.11.0-37-generic
ffffffff820ff8e0 r targets
ffffffff826e1240 d targets_mutex
ffffffff826e1260 d targets
ffffffff8277a5c0 d targets
ffffffff8286b2e8 d targets_supported

In this case, we are dealing with another mechanism for hiding information in linux kernel - using of non-unique names. I was not even lazy and wrote a script to count such names - 998 names. Top 5:

_acpi_module_name: 155
cpumask_weight.constprop.0: 47
kzalloc.constprop.0: 39
get_order: 32
kmalloc_array.constprop.0: 28

As usual the disassembler rushes to the rescue
We can make simple two-state FSM
  1. wait for mutex_lock call
  2. access to memory in .data gives us right address
Results:

sudo ./lkmem -d -c -t ~/krnl/curr ~/krnl/System.map-5.11.0-37-generic
bpf_iter_reg at 0xffffffff859be700: 11
 [0] feature 0 at 0xffffffff85af86e0 - kernel!bpf_sk_storage_map_reg_info
   attach_target: 0xffffffff84c2a430 - kernel!bpf_iter_attach_map
   detach_target: 0xffffffff84c2a410 - kernel!bpf_iter_detach_map
   show_fdinfo: 0xffffffff8440dee0 - kernel!bpf_iter_map_show_fdinfo
   fill_link_info: 0xffffffff8440dec0 - kernel!bpf_iter_map_fill_link_info
 [1] feature 0 at 0xffffffff85af7620 - kernel!sock_map_iter_reg
   attach_target: 0xffffffff84c14fc0 - kernel!sock_map_iter_attach_target
   detach_target: 0xffffffff84c14fa0 - kernel!sock_map_iter_detach_target
   show_fdinfo: 0xffffffff8440dee0 - kernel!bpf_iter_map_show_fdinfo
   fill_link_info: 0xffffffff8440dec0 - kernel!bpf_iter_map_fill_link_info
 [2] feature 0 at 0xffffffff859be840 - kernel!bpf_prog_reg_info
 [3] feature 1 at 0xffffffff859be780 - kernel!task_file_reg_info
 [4] feature 1 at 0xffffffff859be7e0 - kernel!task_reg_info
 [5] feature 0 at 0xffffffff85236420 - kernel!bpf_map_elem_reg_info
   attach_target: 0xffffffff8440e060 - kernel!bpf_iter_attach_map
   detach_target: 0xffffffff8440df10 - kernel!bpf_iter_detach_map
   show_fdinfo: 0xffffffff8440dee0 - kernel!bpf_iter_map_show_fdinfo
   fill_link_info: 0xffffffff8440dec0 - kernel!bpf_iter_map_fill_link_info
 [6] feature 0 at 0xffffffff859be720 - kernel!bpf_map_reg_info
 [7] feature 0 at 0xffffffff85afe7e0 - kernel!ipv6_route_reg_info
 [8] feature 0 at 0xffffffff85afa300 - kernel!udp_reg_info
 [9] feature 0 at 0xffffffff85af9cc0 - kernel!tcp_reg_info
 [10] feature 0 at 0xffffffff85af8bc0 - kernel!netlink_reg_info

blinding sysmon for linux

$
0
0

 Let`s see which tracepoints it using:


sudo ./lkmem -d -c -t ~/krnl/curr ~/krnl/System.map-5.11.0-37-generic
 __tracepoint_sched_process_exit at 0xffffffffa47140c0: enabled 1 cnt 1
  [0] 0xffffffffa2ed3b40 - kernel!perf_trace_sched_process_template
 __tracepoint_sys_exit at 0xffffffffa4714ae0: enabled 1 cnt 1
  regfunc: 0xffffffffa2fa3350 - kernel!syscall_regfunc
  unregfunc: 0xffffffffa2fa3410 - kernel!syscall_unregfunc
  [0] 0xffffffffa2f37f90 - kernel!__bpf_trace_sys_exit
 __tracepoint_sys_enter at 0xffffffffa4714b40: enabled 1 cnt 1
  regfunc: 0xffffffffa2fa3350 - kernel!syscall_regfunc
  unregfunc: 0xffffffffa2fa3410 - kernel!syscall_unregfunc
  [0] 0xffffffffa2f37e30 - kernel!__bpf_trace_sys_enter

  1. my favorite 1bit patch - zero tracepoint->key.enabled
  2. remove BPF client from funcs list
  3. find trace_event_call and install your own event_filter

slides from our talk at Black Hat EU 2021

$
0
0

link

and some

afterword

all presented attacks caused by misuse of Windows logging mechanism for ETW-based EDRs. And I see bad sign when the same thing happens with eBPF on Linux. So who knows - maybe my next paper will be called "blinding eBPF-based EDRs on Linux" :-)

eBPF on cgroups

$
0
0
the long story short - they are stored in array effective and in list progs in cgroup->bpf
Below I will try to explain boring and dirty details

cgroups

This article says:
hierarchy: a set of cgroups arranged in a tree
so we need to find roots and then just traverse this trees. Roots have type cgroup_root and stored in cgroup_hierarchy_idr (synced with mutex cgroup_mutex). As usually linux lies - lets compare content of  /proc/cgroups:
#subsys_name hierarchynum_cgroups enabled
cpuset611
cpu511
cpuacct511
blkio411
memory21481
devices9991
freezer1011
net_cls711
perf_event811
net_prio711
hugetlb311
pids111031
rdma1211

with what cgroup roots actually located on this machine:

[0]  at 0xffffffff8e9a2200 flags 8 hierarchy_id 0 nr_cgrps 145 real_cnt 144
[1] systemd at 0xffff8fd6816ea000 flags 4 hierarchy_id 1 nr_cgrps 145 real_cnt 144
[2]  at 0xffff8fd68297a000 flags 0 hierarchy_id 2 nr_cgrps 148 real_cnt 147
[3]  at 0xffff8fd68297c000 flags 0 hierarchy_id 3 nr_cgrps 1 real_cnt 0
[4]  at 0xffff8fd682978000 flags 0 hierarchy_id 4 nr_cgrps 1 real_cnt 0
[5]  at 0xffff8fd68297e000 flags 0 hierarchy_id 5 nr_cgrps 1 real_cnt 0
[6]  at 0xffff8fd6854c8000 flags 0 hierarchy_id 6 nr_cgrps 1 real_cnt 0
[7]  at 0xffff8fd6854ce000 flags 0 hierarchy_id 7 nr_cgrps 1 real_cnt 0
[8]  at 0xffff8fd6854ca000 flags 0 hierarchy_id 8 nr_cgrps 1 real_cnt 0
[9]  at 0xffff8fd6854cc000 flags 0 hierarchy_id 9 nr_cgrps 99 real_cnt 98
[10]  at 0xffff8fd685e16000 flags 0 hierarchy_id 10 nr_cgrps 1 real_cnt 0
[11]  at 0xffff8fd685e12000 flags 0 hierarchy_id 11 nr_cgrps 103 real_cnt 102
[12]  at 0xffff8fd685e14000 flags 0 hierarchy_id 12 nr_cgrps 1 real_cnt 0

can you find in /proc/cgroups roots with hierarchy ID 0 and 1?

How to traverse this tree? It starts in field cgrp->self and we can use  functions css_next_descendant_pre/css_next_descendant_post etc. Strictly speaking they return pointer to cgroup_subsys_state but this is first field self  in cgroup so casting is safe

eBPF

eBPF programs are stored in prog_idr (synced with spinlock_t prog_idr_lock). Lets see what we have:

sudo ./lkmem -d -c -B ~/krnl/curr ~/krnl/System.map-5.11.0-40-generic
prog_idr at 0xffffffff8e9be540: 23
 [0] prog 0xffff9f8e809a7000 id 31 len 123 jited_len 555
  type: 17 BPF_PROG_TYPE_RAW_TRACEPOINT
  expected_attach_type: 0 BPF_CGROUP_INET_INGRESS
   bpf_func: 0xffffffffc05c2254
 [1] prog 0xffff9f8e809bf000 id 32 len 1824 jited_len 8195
  type: 17 BPF_PROG_TYPE_RAW_TRACEPOINT
  expected_attach_type: 0 BPF_CGROUP_INET_INGRESS
   bpf_func: 0xffffffffc05ba9e0
 [2] prog 0xffff9f8e80905000 id 33 len 1343 jited_len 6186
  type: 17 BPF_PROG_TYPE_RAW_TRACEPOINT
  expected_attach_type: 0 BPF_CGROUP_INET_INGRESS
   bpf_func: 0xffffffffc00612b0
 [3] prog 0xffff9f8e809c7000 id 34 len 1682 jited_len 7822
  type: 17 BPF_PROG_TYPE_RAW_TRACEPOINT
  expected_attach_type: 0 BPF_CGROUP_INET_INGRESS
   bpf_func: 0xffffffffc02d6040
 [4] prog 0xffff9f8e809a9000 id 35 len 1209 jited_len 5510
  type: 17 BPF_PROG_TYPE_RAW_TRACEPOINT
  expected_attach_type: 0 BPF_CGROUP_INET_INGRESS
   bpf_func: 0xffffffffc05be370
 [5] prog 0xffff9f8e809cf000 id 36 len 1397 jited_len 6396
  type: 17 BPF_PROG_TYPE_RAW_TRACEPOINT
  expected_attach_type: 0 BPF_CGROUP_INET_INGRESS
   bpf_func: 0xffffffffc05c46d8
 [6] prog 0xffff9f8e809d7000 id 37 len 1223 jited_len 5578
  type: 17 BPF_PROG_TYPE_RAW_TRACEPOINT
  expected_attach_type: 0 BPF_CGROUP_INET_INGRESS
   bpf_func: 0xffffffffc05c7108
 [7] prog 0xffff9f8e80055000 id 38 len 267 jited_len 1237
  type: 5 BPF_PROG_TYPE_TRACEPOINT
  expected_attach_type: 0 BPF_CGROUP_INET_INGRESS
   bpf_func: 0xffffffffc0059254
 [8] prog 0xffff9f8e8005d000 id 39 len 247 jited_len 1116
  type: 17 BPF_PROG_TYPE_RAW_TRACEPOINT
  expected_attach_type: 0 BPF_CGROUP_INET_INGRESS
   bpf_func: 0xffffffffc05ca90c
 [9] prog 0xffff9f8e80115000 id 40 len 217 jited_len 994
  type: 5 BPF_PROG_TYPE_TRACEPOINT
  expected_attach_type: 0 BPF_CGROUP_INET_INGRESS
   bpf_func: 0xffffffffc05cc2c0
 [10] prog 0xffff9f8e8011d000 id 41 len 744 jited_len 3405
  type: 17 BPF_PROG_TYPE_RAW_TRACEPOINT
  expected_attach_type: 0 BPF_CGROUP_INET_INGRESS
   bpf_func: 0xffffffffc05ce098
 [11] prog 0xffff9f8e809df000 id 42 len 633 jited_len 2701
  type: 5 BPF_PROG_TYPE_TRACEPOINT
  expected_attach_type: 0 BPF_CGROUP_INET_INGRESS
   bpf_func: 0xffffffffc05d0234
 [12] prog 0xffff9f8e808f4000 id 43 len 492 jited_len 2233
  type: 17 BPF_PROG_TYPE_RAW_TRACEPOINT
  expected_attach_type: 0 BPF_CGROUP_INET_INGRESS
   bpf_func: 0xffffffffc05d262c
 [13] prog 0xffff9f8e809bb000 id 44 len 68 jited_len 312
  type: 17 BPF_PROG_TYPE_RAW_TRACEPOINT
  expected_attach_type: 0 BPF_CGROUP_INET_INGRESS
   bpf_func: 0xffffffffc05d41f4
 [14] prog 0xffff9f8e809f1000 id 55 len 2 jited_len 15
  type: 1 BPF_PROG_TYPE_SOCKET_FILTER
  expected_attach_type: 0 BPF_CGROUP_INET_INGRESS
   bpf_func: 0xffffffffc05e49f4
 [15] prog 0xffff9f8e8003b000 id 89 len 8 jited_len 54
  type: 8 BPF_PROG_TYPE_CGROUP_SKB
  expected_attach_type: 0 BPF_CGROUP_INET_INGRESS
   bpf_func: 0xffffffffc04fab24
 [16] prog 0xffff9f8e80037000 id 90 len 8 jited_len 54
  type: 8 BPF_PROG_TYPE_CGROUP_SKB
  expected_attach_type: 0 BPF_CGROUP_INET_INGRESS
   bpf_func: 0xffffffffc0561468
 [17] prog 0xffff9f8e80048000 id 91 len 8 jited_len 54
  type: 8 BPF_PROG_TYPE_CGROUP_SKB
  expected_attach_type: 0 BPF_CGROUP_INET_INGRESS
   bpf_func: 0xffffffffc0563828
 [18] prog 0xffff9f8e8004f000 id 92 len 8 jited_len 54
  type: 8 BPF_PROG_TYPE_CGROUP_SKB
  expected_attach_type: 0 BPF_CGROUP_INET_INGRESS
   bpf_func: 0xffffffffc0565594
 [19] prog 0xffff9f8e80051000 id 93 len 8 jited_len 54
  type: 8 BPF_PROG_TYPE_CGROUP_SKB
  expected_attach_type: 0 BPF_CGROUP_INET_INGRESS
   bpf_func: 0xffffffffc056747c
 [20] prog 0xffff9f8e80053000 id 94 len 8 jited_len 54
  type: 8 BPF_PROG_TYPE_CGROUP_SKB
  expected_attach_type: 0 BPF_CGROUP_INET_INGRESS
   bpf_func: 0xffffffffc056923c
 [21] prog 0xffff9f8e8012f000 id 95 len 8 jited_len 54
  type: 8 BPF_PROG_TYPE_CGROUP_SKB
  expected_attach_type: 0 BPF_CGROUP_INET_INGRESS
   bpf_func: 0xffffffffc058e308
 [22] prog 0xffff9f8e80131000 id 96 len 8 jited_len 54
  type: 8 BPF_PROG_TYPE_CGROUP_SKB
  expected_attach_type: 0 BPF_CGROUP_INET_INGRESS
   bpf_func: 0xffffffffc05900dc


Each eBPF program stored in struct bpf_prog. But wait - we can have several eBPF programs connected to the same event, right? Yeah, so for such cases there is another struct - bpf_prog_array. kernel devs were so generous that they even gave us function bpf_prog_array_length. Unfortunately it is not exported

putting it all together 

Now we know how to
  • find and traverse cgroups
  • where eBPF stored in each cgroup
  • can enum ePBF programs from bpf_prog_array
Lets see what we have:

sudo ./lkmem -d -c -g ~/krnl/curr ~/krnl/System.map-5.11.0-40-generic
[0]  at 0xffffffff8e9a2200 flags 8 hierarchy_id 0 nr_cgrps 145 real_cnt 144
 child 9:
 cgroup at 0xffff8fd6824aa000 serial_nr 80 flags 0 level 2
 cgroup BPF:
  BPF_CGROUP_INET_INGRESS: 0xffff8fd68d361300 cnt 1 flags 0
  [0] prog 0xffff9f8e80037000 id 90 type 8 len 8 jited_len 54
   bpf_func: 0xffffffffc0561468
  BPF_CGROUP_INET_EGRESS: 0xffff8fd687cf5f00 cnt 1 flags 0
  [0] prog 0xffff9f8e8003b000 id 89 type 8 len 8 jited_len 54
   bpf_func: 0xffffffffc04fab24
 child 24:
 cgroup at 0xffff8fd68864b000 serial_nr 286 flags 0 level 2
 cgroup BPF:
  BPF_CGROUP_INET_INGRESS: 0xffff8fd6858fea40 cnt 1 flags 0
  [0] prog 0xffff9f8e80053000 id 94 type 8 len 8 jited_len 54
   bpf_func: 0xffffffffc056923c
  BPF_CGROUP_INET_EGRESS: 0xffff8fd5d1fb6fc0 cnt 1 flags 0
  [0] prog 0xffff9f8e80051000 id 93 type 8 len 8 jited_len 54
   bpf_func: 0xffffffffc056747c
 child 44:
 cgroup at 0xffff8fd68310c000 serial_nr 596 flags 0 level 2
  BPF_CGROUP_INET_INGRESS: 0xffff8fd5c5be97c0 cnt 1 flags 0
  [0] prog 0xffff9f8e80131000 id 96 type 8 len 8 jited_len 54
   bpf_func: 0xffffffffc05900dc
  BPF_CGROUP_INET_EGRESS: 0xffff8fd687cf7300 cnt 1 flags 0
  [0] prog 0xffff9f8e8012f000 id 95 type 8 len 8 jited_len 54
   bpf_func: 0xffffffffc058e308
 child 94:
 cgroup at 0xffff8fd5834d7000 serial_nr 1136 flags 0 level 2
 cgroup BPF:
  BPF_CGROUP_INET_INGRESS: 0xffff8fd5c7cd1f00 cnt 1 flags 0
  [0] prog 0xffff9f8e8004f000 id 92 type 8 len 8 jited_len 54
   bpf_func: 0xffffffffc0565594
  BPF_CGROUP_INET_EGRESS: 0xffff8fd68555fac0 cnt 1 flags 0
  [0] prog 0xffff9f8e80048000 id 91 type 8 len 8 jited_len 54
   bpf_func: 0xffffffffc0563828


What can this useful information give us? Well, there is function cgroup_bpf_prog_detach. And suddenly your eBPF program may stop receiving messages

Link to source code
Viewing all 265 articles
Browse latest View live