Linux kernel allows you to have discardable sections in LKM and this creates problem of links between two kind of memory. As you can guess keeping pointer to already unloaded area can be very dangerous so I made simple tool kotest to check such kind of links. It divides sections of ELF file into two category and check all relocations - relocs between areas of the same type considered as ok. To keep track if some symbol from persistent area is used only from discardable sections I also use couple of reference counts
command line options
- -b take into account variables in .bss
- -h make hexdump of found vars
- -v verbose mode
find path_to_kernel_root -type f -name "*.ko" | xargs kotest
To get summary you can run awk -f total.awkon output of previous command
it is reliable to use for analysis only fixups?
No - there are false positives. Consider excerpt from ip_vs.ko, function ip_vs_register_nl_ioctl:
.init.text:0000000000016155 mov rdi, offset ip_vs_genl_family
.init.text:000000000001615C mov cs:ip_vs_genl_family.module, offset __this_module
.init.text:0000000000016167 mov cs:ip_vs_genl_family.ops, offset ip_vs_genl_ops
.init.text:0000000000016172 mov cs:ip_vs_genl_family.mcgrps, 0
.init.text:000000000001617D mov qword ptr cs:ip_vs_genl_family.n_ops, 10h
.init.text:0000000000016188 call __genl_register_family
it turns out that ip_vs_genl_ops (located inside .rodata section) referred only from function ip_vs_register_nl_ioctl in .init.text, but actually it cannot be moved to discardable area bcs it was registered with genl_register_family. Kotest cannot analyze usage of addresses and so gives FP:
.rodata + 5A0 (ip_vs_genl_ops) rref 1 xref 0 add size 768
Another issue is string merging by ld. Lets assume that we have couple of strings: "foobar" referred from some function(s) in .text section and "bar" referred from code in .init.text. Linker can (and usually do) put only string "foobar" into .rodata and fixup to string "bar" will point to middle of this single string "foobar"
So consider output of kotest as estimated upper bound of memory which can be potentially saved by moving into discardable area
why not use famous objtool?
Because of NIH syndrome objtool employs disassembler and as consequence it is slow and supports only few architectures. Kotest is based on elfio and can process both 32 & 64 bit ELF files from any arch (and it is very fast)
LKM loading
starts in function load_module. It's surprisingly huge amount of buggy code so I briefly describe only most important
- layout_and_allocate collects sections and allocates persistent and discardable modules memory in function layout_sections
- find_module_sections is the most important bcs it fills module structure with lots of pointers to content of section for further processing
- post_relocations from where arch-specific module_finalize are called
- do_init_module calls init function of module and frees discardable memory by inserting new task into init_free_list
There is nasty bug - in sysfs showed all sections (including freed). So sometimes my lkcd shows amazing results like:
Mod[60] 0xffffffffc0454300 base 0xffffffffc0451000 serio_raw
init: 0xffffffffc037e000 - nls_iso8859_1!uni2char
exit: 0xffffffffc0451b8a - serio_raw!serio_raw_drv_exit
field init now points somewhere in middle of module nls_iso8859_1. This happened bcs .init section of serio_raw was freed and now occupied by some other module. Despite this, according to the kernel, it is still listed as part of serio_raw:
ls -1a /sys/module/serio_raw/sections | grep init
.init.text
This bug was caused in function mod_sysfs_setup which knows nothing about discardable sections (and perhaps should call within_module_init to filter out some sections and also save some memory from several module_sect_attr items)
What sections considered by kernel as discardable?
Simple answer - if their names start with ".init". More detailed answer - each architecture can have own version of function module_init_section
For example see arm specific sections
The problem is that this list is not exhausted - some section can be moved to discardable area bcs their content is not used after module initialization. Just to name few:
- ".altinstructions" - processing inside apply_alternatives called from module_finalize<- post_relocation<- load_module before do_init_module
- under Risc-V for unknown reason it has name ".alternative"
- from the same function module_finalize also ".retpoline_sites", ".return_sites" etc
However this not all. Let's check function do_mod_ctors. Field module->ctors can point to section ".init.array" (which is considered as discardable) or to ".ctors" (which is not). Logic? Haven't heard
Which data from discardable sections kernel able to clean up?
The last one has very weird comment (glare example of the fact that sometimes no comment is much better):
If the exception table is sorted, any referring to the module init will be at the beginning or the end.The problem is that exception table (stored in module->extable) is always sorted in post_relocation
So as you can assume content of section "__ex_table" is cleaning up before freeing of init sections
As for first - ftraces initially loads from section with name FTRACE_CALLSITE_SECTION. And I always believed that presence of ftraces for init functions is idea of questionable usefulness. Sure you can manually mark each init function with __attribute__((__no_instrument_function__)). If you are as lazy as me - entrust this task to the gcc with my patch
Results
On 6.8.8 for aarch64 we have 37140 bytes for data referred only from discardable sections (remember about FP) and 245988 bytes for sections which can be moved to discardable area