Quantcast
Channel: windows deep internals
Viewing all articles
Browse latest Browse all 264

gcc plugin to collect cross-references, part 6

$
0
0
Part 1, 2, 3, 4& 5
Finally I was able to compile and collect cross-references for enough big open-source projects like linux kernel and botan:
wc -l botan.db
2108274 botan.db
grep Err: botan.db | wc -l
540

So lets check how we can extract access to record fields. If you take quick look at tree.def you can notice very prominent type COMPONENT_REF:
Value is structure or union component.
 Operand 0 is the structure or union (an expression).
 Operand 1 is the field (a node of type FIELD_DECL).
 Operand 2, if present, is the value of DECL_FIELD_OFFSET

 

Sounds easy? "In theory there is no difference between theory and practice". In practice you can encounter many other types in any combinations, like in this relative simple RTL:
(call_insn:TI 1482 1481 2856 35 (call (mem:QI (mem/f:DI (plus:DI (reg/f:DI 0 ax [orig:340 MEM[(struct Server_Hello_13 *)_325].D.264452.D.264115._vptr.Handshake_Message ] [340])
                    (const_int 24 [0x18])) [744 MEM[(int (*) () *)_199 + 24B]+0 S8 A64]) [0 *OBJ_TYPE_REF(_200;&MEM[(struct _Uninitialized *)&D.349029].D.305525._M_storage->3B) S1 A8])
        (const_int 0 [0])) "/usr/local/include/c++/12.2.1/bits/stl_construct.h":88:18 898 {*call}
     (expr_list:REG_CALL_ARG_LOCATION (expr_list:REG_DEP_TRUE (concat:DI (reg:DI 5 di)
                (reg/f:DI 41 r13 [386]))
            (nil))
        (expr_list:REG_DEAD (reg:DI 5 di)
            (expr_list:REG_DEAD (reg/f:DI 0 ax [orig:340 MEM[(struct Server_Hello_13 *)_325].D.264452.D.264115._vptr.Handshake_Message ] [340])
                (expr_list:REG_EH_REGION (const_int 0 [0])
                    (expr_list:REG_CALL_DECL (nil)
                        (nil))))))
    (expr_list:DI (use (reg:DI 5 di))
        (nil)))

So I`ll describe in brief some TREE types and how to deal with them to extract something useful

 

COMPONENT_REF

Usually several component references form chain of fields - like for SomeStruct.f1.f2.f3 there will be 3:
  1. COMPONENT_REF Op1 will contain FIELD_DECL to field f3 and Op2 reference to
  2. COMPONENT_REF Op1 will contain FIELD_DECL to field f2 and Op2 reference to
  3. COMPONENT_REF Op1 will contain FIELD_DECL to field f1 and Op2 finally references to RECORD_TYPE/UNION_TYPEfor SomeStruct

Pretty easy? Actually not - there are at least two problems:

  • Both Op1 & Op2 can contain any other types - for example SSA_NAME
  • Record in each chain can be nameless. For C++ you can find enclosed class with function get_containing_scope, but in C all nested nameless structures actually has scope TRANSLATION_UNIT_DECL - in such case there is chance that chain will be unlinked

Dirty hack - you even don`t need RECORD_TYPE for each field bcs you can extract it with DECL_CONTEXT

 

SSA_NAME

Just reference to some other TREE - it can be extracted with TREE_TYPE. See function dump_ssa_name


MEM_REF

The type of the MEM_REF is the type the bytes at the memory location are interpreted as.
   MEM_REF <p, c> is equivalent to ((typeof(c))p)->x... where x... is a
   chain of component references offsetting p by c

Type can be extracted with TMR_BASE and offset with TMR_OFFSET.

Well, it would be good to find field at this offset, right? First field can be extracted with TYPE_FIELDS and next with TREE_CHAIN. See function dump_mem_ref for details


ADDR_EXPR

& in C.  Value is the address at which the operand's value resides
Type of value can be extracted with TREE_OPERAND(expr, 0) and again can be any of TREE types. See function dump_addr_expr for details

 

OBJ_TYPE_REF

Used to represent lookup in a virtual method table which is dependent on
   the runtime type of an object.  Operands are:
   OBJ_TYPE_REF_EXPR: An expression that evaluates the value to use.
   OBJ_TYPE_REF_OBJECT: Is the object on whose behalf the lookup is
   being performed.  Through this the optimizers may be able to statically
   determine the dynamic type of the object.
   OBJ_TYPE_REF_TOKEN: An integer index to the virtual method table.
   The integer index should have as type the original type of
   OBJ_TYPE_REF_OBJECT
 

Main source for collecting virtual methods calls

 

So now we can collect all access types to class/structures field and methods. The only uncovered type is pointer to method - can it be tracked? Unfortunately no - nor where offset to method assigned nor where it called. I wrote simple test and methods get_ref look in disasm like:

      mov     eax, 33 ; just some const
      mov     edx, 0
      pop     rbp

and in RTL:
(insn 6 3 7 2 (set (reg:DI 0 ax [orig:82 D.3252 ] [82])
        (const_int 33 [0x21])) "vtest.cc":40:24 80 {*movdi_internal}
     (nil))


For some unknown reason there are no OFFSET_REF& PTRMEM_CST in RTL

Viewing all articles
Browse latest Browse all 264

Trending Articles