Let`s apply priceless knowledge from previous part - for example to extract string literals and insert polymorphic decryption
Typical call to printf/printk in RTL looks usually like
(insn 57 56 58 9 (set (reg:DI 5 di)
(symbol_ref/f:DI ("*.LC0") [flags 0x2] <var_decl 0x7f480e9edea0 *.LC0>)) "swtest.c":17:7 80 {*movdi_internal}
(call_insn 59 58 191 9 (set (reg:SI 0 ax)
(call (mem:QI (symbol_ref:DI ("printf") [flags 0x41] <function_decl 0x7f480e8c7000 printf>) [0 __builtin_printf S1 A8])
(const_int 0 [0]))) "swtest.c":17:7 909 {*call_value}
(nil)
(expr_list (use (reg:QI 0 ax))
(expr_list:DI (use (reg:DI 5 di))
(nil))))
Translation for mere mortals
First instruction sets register 5 with address (via symbol_ref) of some known symbol with cool name "*.LC0"
Second instruction calls another known symbol "printf", arguments for this call stored in expression list - it use register 0 as result and argument in another nested expression list - early loaded register 5
So we can record in cross-references database 2 items for this function - loading of some symbol and call. Sadly name "*.LC0" is probably totally useless. Lets check if we can go deeper
We can extract TREE item for symbol_ref rtl with SYMBOL_REF_DECL and check it`s type - for some variable type is VAR_DECL
Then we can check if this variable is initializedvia DECL_INITIAL
String literal has type STRING_CST
And finally we can get content of this literal with TREE_STRING_POINTER and length with TREE_STRING_LENGTH
I implemented this logic in method is_cliteral
You can ask - is it possible to extract integer constants like sizeof? Unfortunately no, mainly bcs they was converted to RTL const_int during expressions evaluation and RTL does not have tracking mechanism why does some const_int have this value. Typical use of sizeof may looks like:
... =(some_struct *)kmalloc(sizeof(some_struct) + strlen(s) + 1, GFP_KERNEL);