Quantcast
Channel: windows deep internals
Viewing all 265 articles
Browse latest View live

custom attributes in gcc and dwarf

$
0
0

Lets check if we can add our own attributes (if Google can afford it, then why is it forbidden to mere mortals?). For example I want to have in gcc and dwarf flag about functions/methods parameters direction - is some param IN or OUT. I chose the value of this dwarf attribure 0x28ff

It`s pretty obviously that we can add our own custom attribute in gcc - they even have example how to do this. But what about dwarf producer? Long story short - seems that you cannot do it from plugin. The only dwarf related pass for plugins is pass_dwarf2_frame. So we need to patch gcc. But before this we need to 

build gcc from sources

At moment of writing latest stable version of gcc was 12.0 so run 

git clone --branch releases/gcc-12 https://github.com/gcc-mirror/gcc.git

and then follows instructions

patch gcc

Lets see how gcc produces dwarf output. All symbol table formatters implement gcc_debug_hooks and currently gcc has 3 (btw there are patches for mingw to produce PDB, so in theory you could have vmlinux.pdb):

so lets add function add_param_direction in dwarf2out.cc:
bool add_param_direction(tree decl, dw_die_ref parm_die)
{
  bool pa1 = lookup_attribute ("param_in", DECL_ATTRIBUTES (decl));
  bool pa2 = lookup_attribute ("param_out", DECL_ATTRIBUTES (decl));
  if ( !(pa1 ^ pa2) )
    return false;
  unsigned char pa_value = 0;
  // seems that you can`t have flag with value 1 - see gcc_assert at line 9599
  if ( pa1 )
    pa_value = 2;
  if ( pa2 )
    pa_value = 3;
  add_AT_flag(parm_die, (dwarf_attribute)0x28ff, pa_value);
  return true;
}
It first checks if parameter has attribute param_in or param_out (but not both at the same time bcs this is senseless) and adds custom dwarf flag attribute via add_AT_flag call. Then we just need to call this function from gen_formal_parameter_die
 
Now we can add our custom attributes- this can be done via plugin but I preferred to patch c-family/c-attribs.cc:
 
tree handle_param_in_attribute (tree *node, tree name, tree ARG_UNUSED (args),
                         int ARG_UNUSED(flags), bool *no_add_attrs)
{
  if ( !DECL_P (*node) )
  {
    warning (OPT_Wattributes, "%qE attribute can apply to params declarations only", name);
    *no_add_attrs = true;
    return NULL_TREE;
  }
  tree decl = *node;
  if (TREE_CODE (decl) != PARM_DECL)
  {
    warning (OPT_Wattributes, "%qE attribute can apply to params only", name);
    *no_add_attrs = true;
  } else {
    // check presense of param_out
    if ( lookup_attribute ("param_out", DECL_ATTRIBUTES (decl)) )
    {
      warning (OPT_Wattributes, "%qE attribute useless when param_out was used", name);
      *no_add_attrs = true;
      DECL_ATTRIBUTES (decl) = remove_attribute("param_out", DECL_ATTRIBUTES (decl));
    }
  }
  return NULL_TREE;
}

Function handle_param_in_attribute checks that this attribute linked with function/method parameter. Then it checks that the same parameter don`t have attribute param_out - in this case it just removes both
 
All patches located here 

results

Lets define couple of new keywords:

#define IN    __attribute__((param_in))
#define OUT    __attribute__((param_out))

and mark some arguments as IN or OUT - for example for method bool PlainRender::dump_type(uint64_t key, OUT std::string &res, named *n, int level)

After rebuilding of debug version with our patched gcc we can see in objdump output something like this

<1><10d8ac>: Abbrev Number: 18 (DW_TAG_subprogram)
    <10d8ad>   DW_AT_specification: <0x103579>
    <10d8b1>   DW_AT_object_pointer: <0x10d8cc>
    <10d8b5>   DW_AT_low_pc      : 0x4301a6
    <10d8bd>   DW_AT_high_pc     : 0x4317e7
    <10d8c5>   DW_AT_frame_base  : 1 byte block: 9c     (DW_OP_call_frame_cfa)
    <10d8c7>   DW_AT_GNU_all_tail_call_sites: 1
    <10d8c8>   DW_AT_sibling     : <0x10db29>
 <2><10d8cc>: Abbrev Number: 10 (DW_TAG_formal_parameter)
    <10d8cd>   DW_AT_name        : (indirect string, offset: 0x107d3): this
    <10d8d1>   DW_AT_type        : <0x1035e5>
    <10d8d5>   DW_AT_artificial  : 1
    <10d8d6>   DW_AT_location    : 3 byte block: 91 98 7c       (DW_OP_fbreg: -488)
 <2><10d8da>: Abbrev Number: 45 (DW_TAG_formal_parameter)
    <10d8db>   DW_AT_name        : key
    <10d8df>   DW_AT_decl_file   : 1
    <10d8e0>   DW_AT_decl_line   : 123
    <10d8e1>   DW_AT_decl_column : 38
    <10d8e2>   DW_AT_type        : <0xfb4be>
    <10d8e6>   DW_AT_location    : 3 byte block: 91 90 7c       (DW_OP_fbreg: -496)
 <2><10d8ea>: Abbrev Number: 234 (DW_TAG_formal_parameter)
    <10d8ec>   Unknown AT value: 28ff: 3
    <10d8ed>   DW_AT_name        : res


estimation of maximum clique size

$
0
0

definition 1.1 from really cool book"The Design of Approximation Algorithms":

An α-approximation algorithm for an optimization problem is a polynomial-time algorithm that for all instances of the problem produces a solution whose value is within a factor of α of the value of an optimal solution

so you need first to estimate at least size of possible optimal solution, right?

Surprisingly I was unable to find it for maximal clique. stackexchange offers very simple formula (spoiler: the actual size is a couple of orders of magnitude smaller). python networkX offers method with complexity O(1.4422n) to find maximal clique itself only. cool. Let's invent this algorithm by ourselves

From wikipedia:

A clique, C, in an undirected graphG = (V, E) is a subset of the vertices, CV, such that every two distinct vertices are adjacent

in other words this means that graph with maximal clique of size K should contains at least K vertices with degree K - 1 or bigger. So we can arrange vertices on degrees and find some degree S where amount of vertices with degree S or bigger is >= S. But this is very rough estimation and it could be refined taking into account the following observation - we can remove all edges to vertices not belonging to this subgraph. So algo is:

  1. calculate degrees of all vertices and arrange them in descending order
  2. for each degree S find first where amount of vertices with degree S or bigger is >= S
  3. put all such vertices in sub-graph SD
  4. remove from SD all edges to vertices not belonging to SD
  5. recalculate degrees of all vertices in SD
  6. find another degree S in SD where amount of vertices with degree S or bigger is >= S. this will be result R

next we can repeat steps 2-6 until enumerate all degrees or some degree will be less than the previously found result R

Complexity

Let N - amount of vertices and M - amount of edges. Then cycle can run max N times and in each cycle we can remove less that M edges (actually in average M/2), so in worst case complexity is O(MN/2)

Results 

on graph with N = 20000 and M = 170000:

19 on degree 19, 7216 vertices in SD
predicted with stackexchange: 590.205223 

on graph with N = 200000 and M = 1700000:

20 on degree 19, 69489 vertices in SD
predicted with stackexchange: 1846.546113
 

The strangest thing here is that amount of vertices in SD reduced in ~3 times from whole original graph - like degree in Bron–Kerbosch algorithm

yet another maximal clique algorithm

$
0
0

It seems that most of known algorithms for maximal clique try to add as much vertices as possibly and evolving towards more complex heuristics for vertices ordering. But there is opposite way - we can remove some vertices from neighbors, right?

Lets assume that we sorted all vertices of graph with M vertices and N edges by their degrees in descending order and want to check if some vertex with degree K can contain clique. We can check if all of it`s neighbors mutually connected and find one or several most loosely connected vertices - lets name it L. This checking requires K -1 access to adjacency matrix for first vertex, K -2 for second etc - in average (K^ 2) / 2. If no unconnected vertices was found - all survived neighbors are clique. See sample of implementation in function naive_remove

Now we should decay what we can do with L and there is only 2 variants:

  1. we can remove it from set of neighbors
  2. we can keep it and remove from set of neighbors all vertices not connected with L

Notice that in both cases amount of neighbors decreased by at least 1. Now we can recursively repeat this process with removed and remained L at most K times, so complexity will be O = (K ^ 2) / 2 * (2 ^ K)

We can repeat this process for all vertices with degree bigger than maximal size of previously found clique - in worse case M times, so overall complexity of this algorithm is O = M * (K ^ 2) / 2 * (2 ^ K)

In average K =  N / M

well, not very good result but processing of each vertex can be done in parallel

We can share adjacency matrix (or even make it read-only) between all working threads and this recursive function will require in each step following memory:

  1. bitset of survived neighbors - K / 8 where V[i] is 1 if this vertex belongs to neighbors and 0 if it was removed
  2. array for unconnected vertices counts with size K

given that recursion level does not exceed K overall used space on stack is

S = K * (K / 8 + K * sizeof(index))

now check if we can run this algorithm on

gpu

Disclaimer: I read book about CUDA programming almost 10 years ago so I can be wrong
Let see some modern CPU and find maximal size of their local thread memory
 
NVidia GTX TITAN
6 Gb of memory / 2688 cores = 2.2Mb
if index can fit in short (2 byte) we can process vertexes with degree up to 1028
1000$ / 1028 = ~1$ for each vertex degree
 
40 Gb / 4608 cores = 10.4Mb
vertexes can have degree up to 2237
10000$ / 2237 = 4.5$ for each vertex degree

ctf-like task based on maximal clique problem

$
0
0

Sources

There is undirected graph with 1024 vertices and 100909 edges (so average degree is 98.5). It is known that the graph contains clique with size 16. You can pass indexes of clique`s vertices in command line like

./ctf 171 345

./ctf 171 346
too short clique 

This vertices of clique then used to derive AES key and decrypt some short string

Can you solve this?

gcc plugin to collect cross-references, part 1

$
0
0

Every user of IDA Pro likes cross-references - they are very useful but applicable for objects in global memory only. What if I want to have cross-references for virtual methods and class/record fields - like what functions some specific virtual method was called from? Unfortunately IDA Pro cannot shows this - partially because this information is not stored in debug info and also due to weak algo for types propagation. Call of virtual method typically looks similar to

 mov rax, [rbp+var_8] ; this
 mov     rax, [rax]   ;
this._vptr
 add     rax, 10h
 mov     rcx, [rax]   ; load method from vtable, why not
mov rcx, [rax+0x10]?
 call    rcx ; or even better just call [rax+0x10]?

Lets think where we can get such kind of cross-references - sure compiler must have it somewhere inside to generate native code, right? So generally speaking compiler is your next friend (right after disassembler and debugger).

Run gcc with -c -fdump-final-insns options on simple C++ test file and check how call of virtual method looks like:
(call_insn # 0 0 2 (set (reg:DI 0 ax)
        (call (mem:QI (reg/f:DI 1 dx [orig:85 _4 ] [85]) [ *OBJ_TYPE_REF(_4;this_7(D)->3B) S1 A8])
            (const_int 0 [0]))) "vtest.cc":31:21# {*call_value}

What? What is _4, which type has this and what means ->3B instead of method name? Looking ahead, I can say that actually all needed information really stored in RTL thought function dump_generic_node (from tree-pretty-print.cc) is just too lazy to show it properly. Seems that we can develop gcc plugin to extract this cross-references (in fact, the first couple of months of development this was not at all obvious)

why gcc?

bcs gcc is standard de-facto and you can expect that you will able to build with it almost any sources (usually after numerous loud curses finally read the documentation), even on windows. Nevertheless lets consider other popular alternatives

visual c++

Unlike .NET with excellent roslyn Microsoft don`t allow you to make plugin for their C++ compiler nor even describes it`s internal IR

clang

Don't write a clang plugin (c)

In fact probably you could do this. If I right remember llvm IR has full support for virtual methods. Another question is to what extent final native code matches with IR - like some pieces of code could be removed due to dead-code elimination, merged within GCSE pass and so on

Besides llvm still has someproblems with producing bootable linux kernel

Anyway I am too stupid to understand bunch of llvm classes with hellish templates and too impatient to wait for several hours every time when recompiling clang after inserting couple of debug prints

gcc plugin basics

There are several good examples of gcc plugins: 1& 2
The first has repeating bug while dump plugin parameters - plugin_info->argv[i].value should be checked against NULL bcs according to documentation in gcc command line option
-fplugin-arg-NAME-<key>[=<value>]
the value is not mandatory
So I just wrote class derived from rtl_opt_pass, registered it for pass "final" (sounds fatal and extremely intimidating) and now in virtual method execute we can open gates to the hell

walk on RTL tree

RTL has quite simple structure comparing to GENERIC/GIMPLE - for example maximal value of rtx_code enum is 0x98 (LAST_AND_UNUSED_RTX_CODE) vs tree_code 0x175 (MAX_TREE_CODES)
 
gcc don`t provide you with ready-to-use iterator for RTL but you can cut&paste some code from function rtx_writer::print_rtx. As you can see in rtx_writer::print_rtx_operand you can extract format string with GET_RTX_FORMAT (GET_CODE (in_rtx)) and then call format-specific dumper for each operand. Some of this dumpers in its turn call print_rtx again so this process is recursive and I had to add stack to track all parents RTL nodes (thus I can know that current expression is part of assignment to some memory if I see in this stack set mem:0 for example)
 
But the most important thing here is located in arbitrary places calls to print_mem_expr - thin wrapper for print_generic_expr. This is where you finally can have access to real gcc internals - tree_node union contained everything starting from structure tree_base
This is huge topic and I hope it will be described in next part

gcc plugin to collect cross-references, part 2

$
0
0
Because I still fighting with endless variants of unnamed types while processing linux kernel lets talk about persistence
 
The final goal of this plugin is to make from sources database of functions for methods they call and fields they use. After that you can find set of functions referring to some field/method and investigate them later with disasm for example
 
So sure plugin must store it`s results to somewhere
Probably graph databases is better suited for such data - like you can put symbols as vertices and references as edges, then all references to some symbol is just all of it`s incoming edges. But I am too lazy to install JVM and Neo4j, so I used SQLite (and simple YAML-like files for debugging). You can connect your own storage by implementing interface FPersistence 
Details of most important methods from this interface

int connect(const char *path, const char *username, const char *password)

called during plugin initialization to connect to database.
path passed via plugin parameter -fplugin-arg-gptest-db
username passed via plugin parameter -fplugin-arg-gptest-user
passwordpassed via plugin parameter -fplugin-arg-gptest-password
method must return 0 if connection was successfully established

void cu_start(const char *filename) 

called from PLUGIN_START_UNIT callback, so passed filename can be relative. Probably better to use function realpath to get full name of file

int func_start(const char *funcname)

called when plugin starts processing of next function, funcname is mangled name of function. If you don`t want to process this function method can just return non-zero value

void bb_start(int idx)

Basic blocks in gcc differs from what you can see in IDA Pro - they ends on first jump, if this is conditional jump then block will have two ongoing edges - to some other block and so called FALLTHRU to the next

idx is just index of basic block

void add_xref(xref_kind, const char *symname)

main method, called when plugin found in currently processing function cross-reference to something (usually even with symbol name)

void report_error(const char *err_msg)

I decided that it would be convenient to put errors in log/database, so this method called when plugin was unable to find something (like name of nameless structure when access to it`s field occurred and so on)

gcc plugin to collect cross-references, part 3

$
0
0
Part 1& 2
Lets start walk climb on TREEs. Main sources for reference are tree.h, tree-core.h& print-tree.cc
 
Caution: bcs we traveling during RTL pass some of tree types already was removed so it is unlikely to meet TYPE_BINFO/BINFO_VIRTUALS etc

Main structure is tree_base, it included as first field in all other types - for example tree_type_non_common has first field with type tree_type_with_lang_specific,
which has field common with type tree_type_common,
which again has field common with type tree_common
which has field typed with type tree_typed,
which has field base with type tree_base 
Kind of ancient inheritance in pure C

Caution 2: many fields has totally different meaning for concrete types, so GCC strictly stimulate to use macros from tree.h to access all fields

Type of TREE can be obtained with CODE_TREE and name of code with function get_tree_code_name
CODE_TREE returns enum tree_code and it has a lot of values - MAX_TREE_CODES eq 0x175. So lets check only important subset

 

XXX_CST

represent some constant
  • STRING_CST - literal constant, length can be obtained with TREE_STRING_LENGTH and content with TREE_STRING_POINTER. Warning - types and declaration names are not literal constants!
  • INTEGER_CST- integer constant, value can be obtained with wi::to_wide call
  • POLY_INT_CST- also integer constant, but split to several elements. Count of such values can be extracted with NUM_POLY_INT_COEFFS and each value via POLY_INT_CST_COEFF 

 

BLOCK

represent some block (usually contains VAR & labels declarations and inlined functions). Variables can be obtained with BLOCK_VARS and BLOCK_NONLOCALIZED_VAR (don`t ask me why there is two kind of vars), embedded blocks with BLOCK_SUBBLOCKS and next block with BLOCK_CHAIN (there should be some joke about cryptocurrencies here)
See sample of traveling on function tree in dump_func_tree

 

XXX_DECL

represent declaration of structure, union, var etc. Often can have name - you should use DECL_NAME which return another tree with code IDENTIFIER_TREE or NULL_TREE. In first case you can get name via IDENTIFIER_POINTER and check if name is really presented via DECL_NAMELESS
 
Variables can have initial values - you can check this with DECL_INITIAL
Type of declaration can be extracted with TREE_TYPE
 
For functions you can access first argument with DECL_ARGUMENTS (and remained with DECL_CHAIN) and result with DECL_RESULT
 
Fields of records can be extracted with TYPE_FIELDS
Offset of field can be extracted with DECL_FIELD_OFFSET
 
One remarkable declaration is

 

LABEL_DECL

has type tree_label_decl and contains rtx for some code label which can be extracted with DECL_RTL_IF_SET

 

XXX_TYPE

represent some type. This is huge topic so I cover only basic things
You can get types name with TYPE_NAME. It again returns another tree with code IDENTIFIER_TREE or NULL_TREE
 
You can quickly check if some type is function/record/union with XXX_TYPE_P macros like RECORD_OR_UNION_TYPE_P, FUNC_OR_METHOD_TYPE_P, POINTER_TYPE_P etc
 
Size of type can be extracted with TYPE_SIZE - it returns tree usually with type INTEGER_CST
 
Enum values can be extracted with TYPE_VALUES
Fields of record/union can be extracted with TYPE_FIELDS
for pointers and references you can extract base type with TREE_TYPE 

gcc plugin to collect cross-references, part 4

$
0
0
Let`s apply priceless knowledge from previous part - for example to extract string literals and insert polymorphic decryption
Typical call to printf/printk in RTL looks usually like

(insn 57 56 58 9 (set (reg:DI 5 di)
        (symbol_ref/f:DI ("*.LC0") [flags 0x2] <var_decl 0x7f480e9edea0 *.LC0>)) "swtest.c":17:7 80 {*movdi_internal} 

(call_insn 59 58 191 9 (set (reg:SI 0 ax)
        (call (mem:QI (symbol_ref:DI ("printf") [flags 0x41] <function_decl 0x7f480e8c7000 printf>) [0 __builtin_printf S1 A8])
            (const_int 0 [0]))) "swtest.c":17:7 909 {*call_value}
     (nil)
    (expr_list (use (reg:QI 0 ax))
        (expr_list:DI (use (reg:DI 5 di))
            (nil))))

Translation for mere mortals
First instruction sets register 5 with address (via symbol_ref) of some known symbol with cool name "*.LC0"
Second instruction calls another known symbol "printf", arguments for this call stored in expression list - it use register 0 as result and argument in another nested expression list - early loaded register 5
 
So we can record in cross-references database 2 items for this function - loading of some symbol and call. Sadly name "*.LC0" is probably totally useless. Lets check if we can go deeper

We can extract TREE item for symbol_ref rtl with SYMBOL_REF_DECL and check it`s type - for some variable type is VAR_DECL
Then we can check if this variable is initializedvia DECL_INITIAL
String literal has type STRING_CST 
And finally we can get content of this literal with TREE_STRING_POINTER and length with TREE_STRING_LENGTH 
 
I implemented this logic in method is_cliteral
Also for storing literals I add new virtual method into persistence interface 
 
You can ask - is it possible to extract integer constants like sizeof? Unfortunately no, mainly bcs they was converted to RTL const_int during expressions evaluation and RTL does not have tracking mechanism why does some const_int have this value. Typical use of sizeof may looks like:

... =(some_struct *)kmalloc(sizeof(some_struct) + strlen(s) + 1, GFP_KERNEL);

In RTL this code will be converted to call of strlen, then add to result in register some const_int (value is evaluated expression 1 + sizeof(some_struct)) and then passing it in register or stack into kmalloc via expr_list. Btw second argument also will be passed via const_int so it's impossible to recover value back to GFP_KERNEL constant

gcc plugin to collect cross-references, part 5

$
0
0

Part 1, 2, 3& 4

Lets check how RTL describes jump tables. I made simple test and output of gcc -fdump-final-insns looks like:

(jump_insn # 0 0 8 (parallel [
            (set (pc)
                (reg:DI 0 ax [93]))
            (use (label_ref #))
        ]) "swtest.c":14:3# {*tablejump_1}
     (nil)
 -> 8)
(barrier # 0 0)
(code_label # 0 0 8 (nil) [2 uses])
(jump_table_data # 0 0 (addr_vec:DI [
            (label_ref:DI #)
            (label_ref:DI #)
...
        ]))

As you can see jump_insn uses opcode tablejump_1 refering to label 8. Right after this label located RTL with code jump_table_data - perhaps this is bad idea to assume that it always will be true so it`s better to use function jump_table_for_label. Also for some unknown reason option -fdump-final-insns does not show content of jump tables. So at least lets try to find jump_table_datas from plugin

Surprisingly you cannot find then when iterating on instructions within each block (using FOR_ALL_BB_FN/FOR_BB_INSNS macros). I suspect this due to the fact that both label and jump_table belong to block with index 0. So I used another cycle: 
for ( insn = get_insns(); insn; insn = NEXT_INSN(insn) )
Then we can check if current RTL instruction is jump table with JUMP_TABLE_DATA_P. Jump tables have addr_vec in element with index 3 and each element is label_ref. Length of vector can be obtained from field num_elem. Pretty easy, so what we can do with this knowledge?
Well, we could at least put jump tables with their sizes into debug info

Lets check how this can be done
All debug info produced in gcc via structure gcc_debug_hooks and it even has field with neat name label. The problem that in dwarf2out.cc this field points to debug_nothing_rtx_code_label, which as you can guess really do nothing. Oops
 
Actually labels inserting in dwarf2 output somewhere inside function gen_label_die called from gen_decl_die. So we need to 
  1. find jump tables
  2. check that they are unnamed - respecting code_label does not have LABEL_DECL
  3. add LABEL_DECL with some name
  4. and put link to RTL with SET_DECL_RTL
And yes - we can now see our newly inserted labels in debug info. Without size for sure, bcs code in gen_label_die need to be patchedto do this. Again
 
And finally run objdump -g to see
 <2><11a>: Abbrev Number: 14 (DW_TAG_label)
    <11b>   DW_AT_name        : (indirect string, offset: 0x5f): main_jt52
    <11f>   DW_AT_decl_file   : 1
    <120>   DW_AT_decl_line   : 4
    <121>   DW_AT_decl_column : 5
    <122>   DW_AT_byte_size   : 10
    <123>   DW_AT_low_pc      : 0x402050

gcc plugin to collect cross-references, part 6

$
0
0
Part 1, 2, 3, 4& 5
Finally I was able to compile and collect cross-references for enough big open-source projects like linux kernel and botan:
wc -l botan.db
2108274 botan.db
grep Err: botan.db | wc -l
540

So lets check how we can extract access to record fields. If you take quick look at tree.def you can notice very prominent type COMPONENT_REF:
Value is structure or union component.
 Operand 0 is the structure or union (an expression).
 Operand 1 is the field (a node of type FIELD_DECL).
 Operand 2, if present, is the value of DECL_FIELD_OFFSET

 

Sounds easy? "In theory there is no difference between theory and practice". In practice you can encounter many other types in any combinations, like in this relative simple RTL:
(call_insn:TI 1482 1481 2856 35 (call (mem:QI (mem/f:DI (plus:DI (reg/f:DI 0 ax [orig:340 MEM[(struct Server_Hello_13 *)_325].D.264452.D.264115._vptr.Handshake_Message ] [340])
                    (const_int 24 [0x18])) [744 MEM[(int (*) () *)_199 + 24B]+0 S8 A64]) [0 *OBJ_TYPE_REF(_200;&MEM[(struct _Uninitialized *)&D.349029].D.305525._M_storage->3B) S1 A8])
        (const_int 0 [0])) "/usr/local/include/c++/12.2.1/bits/stl_construct.h":88:18 898 {*call}
     (expr_list:REG_CALL_ARG_LOCATION (expr_list:REG_DEP_TRUE (concat:DI (reg:DI 5 di)
                (reg/f:DI 41 r13 [386]))
            (nil))
        (expr_list:REG_DEAD (reg:DI 5 di)
            (expr_list:REG_DEAD (reg/f:DI 0 ax [orig:340 MEM[(struct Server_Hello_13 *)_325].D.264452.D.264115._vptr.Handshake_Message ] [340])
                (expr_list:REG_EH_REGION (const_int 0 [0])
                    (expr_list:REG_CALL_DECL (nil)
                        (nil))))))
    (expr_list:DI (use (reg:DI 5 di))
        (nil)))

So I`ll describe in brief some TREE types and how to deal with them to extract something useful

 

COMPONENT_REF

Usually several component references form chain of fields - like for SomeStruct.f1.f2.f3 there will be 3:
  1. COMPONENT_REF Op1 will contain FIELD_DECL to field f3 and Op2 reference to
  2. COMPONENT_REF Op1 will contain FIELD_DECL to field f2 and Op2 reference to
  3. COMPONENT_REF Op1 will contain FIELD_DECL to field f1 and Op2 finally references to RECORD_TYPE/UNION_TYPEfor SomeStruct

Pretty easy? Actually not - there are at least two problems:

  • Both Op1 & Op2 can contain any other types - for example SSA_NAME
  • Record in each chain can be nameless. For C++ you can find enclosed class with function get_containing_scope, but in C all nested nameless structures actually has scope TRANSLATION_UNIT_DECL - in such case there is chance that chain will be unlinked

Dirty hack - you even don`t need RECORD_TYPE for each field bcs you can extract it with DECL_CONTEXT

 

SSA_NAME

Just reference to some other TREE - it can be extracted with TREE_TYPE. See function dump_ssa_name


MEM_REF

The type of the MEM_REF is the type the bytes at the memory location are interpreted as.
   MEM_REF <p, c> is equivalent to ((typeof(c))p)->x... where x... is a
   chain of component references offsetting p by c

Type can be extracted with TMR_BASE and offset with TMR_OFFSET.

Well, it would be good to find field at this offset, right? First field can be extracted with TYPE_FIELDS and next with TREE_CHAIN. See function dump_mem_ref for details


ADDR_EXPR

& in C.  Value is the address at which the operand's value resides
Type of value can be extracted with TREE_OPERAND(expr, 0) and again can be any of TREE types. See function dump_addr_expr for details

 

OBJ_TYPE_REF

Used to represent lookup in a virtual method table which is dependent on
   the runtime type of an object.  Operands are:
   OBJ_TYPE_REF_EXPR: An expression that evaluates the value to use.
   OBJ_TYPE_REF_OBJECT: Is the object on whose behalf the lookup is
   being performed.  Through this the optimizers may be able to statically
   determine the dynamic type of the object.
   OBJ_TYPE_REF_TOKEN: An integer index to the virtual method table.
   The integer index should have as type the original type of
   OBJ_TYPE_REF_OBJECT
 

Main source for collecting virtual methods calls

 

So now we can collect all access types to class/structures field and methods. The only uncovered type is pointer to method - can it be tracked? Unfortunately no - nor where offset to method assigned nor where it called. I wrote simple test and methods get_ref look in disasm like:

      mov     eax, 33 ; just some const
      mov     edx, 0
      pop     rbp

and in RTL:
(insn 6 3 7 2 (set (reg:DI 0 ax [orig:82 D.3252 ] [82])
        (const_int 33 [0x21])) "vtest.cc":40:24 80 {*movdi_internal}
     (nil))


For some unknown reason there are no OFFSET_REF& PTRMEM_CST in RTL

dwarf5 from clang 14

$
0
0

It seems that clang in version 14 utilize more advanced features from DWARF5, so I add their support to my dwarfdump. IMHO most exciting features are:

Section .debug_line_str

In old versions of dwarf filenames have duplicates for each compilation unit. Since dwarf version 5 they storing in separate section and thus shared and save some space. Obviously this space reducing is negligible compared to overhead from types duplication

Section .debug_str_offsets

Also for space reducing each compilation unit has so called base index for strings passed via DW_AT_str_offsets_base. But there is problem - some attributes already can have name before DW_AT_str_offsets_base occurs:


  <0><c>: Abbrev Number: 1 (DW_TAG_compile_unit)
    <d>   DW_AT_producer    : (indexed string: 0): clang version 14.0.6 (git@github.com:github/semmle-code 5c87e7737f331823ed8ed280883888566f08cdea)
    <e>   DW_AT_language    : 33        (C++14)
    <10>   DW_AT_name        : (indexed string: 0x1): c/extractor/src/extractor.cpp
    <11>   DW_AT_str_offsets_base: 0x8


As you can see here 2 tags have names before we have value of string base. Much harder to parse in one pass now

New locations format

I think this is the most cool and useful feature - now each variable and parameter has set of locations linked with address ranges (that`s often case for highly optimized code). Sample:

   Offset Entry 2077
    0024ef56 00000000000006b4 (index into .debug_addr) 004fb3c500000000 (base address)
    0024ef59 0000000000000000 000000000000001c DW_OP_reg5 (rdi)

This cryptic message means that starting from address 0x4fb3c5 (note - most tools like objdump or llvm-dwarfdump cannot correctly show this new locations, in this case objdump showed address in bad format) some local variable located in register rdi until next address range. Seems that both IDA Pro and Binary Ninja cannot use this debug information:
.text:00004FB3C5     mov     rdi, cs:compilation_tf
.text:00004FB3CC     cmp     dword ptr [rdi+0Ch], 0

Global var compilation_tf has type a_trap_file_ptr - pointer to a_trap_file. IDA Pro has that types information from debug info but anyway cannot show access to field of a_trap_file at offset 0xC for next instruction

 
As result of all my patches now I can for example inspect IL structures from Microsoft CodeQL C++ extractor:
// Size 0x28
// FileName: c/extractor/edg/src/il_def.h
struct a_name_reference {
// Offset 0x0
a_name_reference_ptr next;
// Offset 0x8
a_name_qualifier_ptr qualifier;
// Offset 0x10
union {
    // Offset 0x0
    a_type_ptr destructor_type;
    // Offset 0x0
    a_property_or_event_descr_ptr property_or_event_descr;
  } variant;
// Offset 0x18
long num_template_arguments;
// Offset 0x20
enum a_special_function_kind special_kind;
// Offset 0x20
a_bit_field is_global_qualified_name:23:1;
// Offset 0x20
a_bit_field is_template_id:22:1;
// Offset 0x20
a_bit_field is_super_qualified:21:1;
// Offset 0x20
a_bit_field is_decltype_qualified:20:1;
// Offset 0x20
a_bit_field used_in_primary_declarator:19:1;
// Offset 0x20
a_bit_field from_prototype_instantiation:18:1;
};

gcc plugin to collect cross-references, part 7

$
0
0

Part 1, 2, 3, 4, 5& 6

Lets check if we can extract other kind of constants - numerical. Theoretically there are no problems - they have types INTEGER_CST, REAL_CST, COMPLEX_CST and so on. And you even can meet them - mostly in programs written in fortran
In most code they usually replaced with RTX equivalents like
  • INTEGER_CST - const_int (or const_wide_int)
  • REAL_CST - const_double

const_double is easy case but const_ints are really ubiquitous, they can appear in RTX even when they do not occur in operands of asssembler`s code. So main task is to select only small subset of them. Let`s consider what we can filter out

fields offsets

Luckily this hard part has already been solved in previous part

local variables offsets in stack

RTX has field frame_related:
1 in an INSN or a SET if this rtx is related to the call frame, either changing how we compute the frame address or saving and restoring registers in the prologue and epilogue

this flag affects both parts of set, for loading something from stack it looks something like:
set (reg:DI 0 ax [83])
        (mem/f/c:DI (plus:DI (reg/f:DI 6 bp)
                (const_int -8 [0xfffffffffffffff8]))

and for storing to stack like:
set (mem/f/c:DI (plus:DI (reg/f:DI 6 bp)
                (const_int -8 [0xfffffffffffffff8])) [4 this+0 S8 A64])
        (reg:DI 5 di [ _0 ]))

conditions

Yes, if_then_else almost always follows compare: 
(set (reg:CCZ 17 flags)
        (compare:CCZ (reg:QI 2 cx [orig:83 _2 ] [83])
            (const_int 0 [0]))) "vtest.cc":44:19 5 {*cmpqi_ccno_1}
(jump_insn 10 9 11 2 (set (pc)
        (if_then_else (eq (reg:CCZ 17 flags)
                (const_int 0 [0]))
            (label_ref 16)
            (pc))) "vtest.cc":44:19 891 {*jcc}
All these bulky constructions will be translated to just jz, so no const_int 0 will be placed in output

EH block index

like in each function call:

(expr_list:REG_EH_REGION (const_int 0 [0])

Now output looks much better, but probably you would like to have more control over operations types. For this purpose I add to my plugin option -fplugin-arg-gptest-ic=config.file

Lets assume that you compiling some crypto library and want to extract integer constants only for some operations - like assignments, shift and xor but not and. You can put RTX code names to config.file and prefix with '-' unneeded:

set
xor
ashift
lshiftrt
-and

location lists from dwarf5

$
0
0
I added during past weekend support for var location lists from DWARF5 (located in separate section .debug_loclists) in my dwarfdump. As usually lots of bugs were found

First - they presents only for functions but not for methods. Probably this is real bug and can have serious impact when debugging

Second - generated expressions is not optimal. Lets see example:

locx 53e
4FB5DF - 4FB5F4: DW_OP_piece 0x8, DW_OP_reg0 RAX, DW_OP_piece 0x8, DW_OP_breg0 RAX+0, DW_OP_lit3, DW_OP_lit8, DW_OP_mul, DW_OP_plus, DW_OP_stack_value, DW_OP_piece 0x8
4FB5F4 - 4FB60F: DW_OP_piece 0x8, DW_OP_reg2 RCX, DW_OP_piece 0x8, DW_OP_breg0 RAX+0, DW_OP_lit3, DW_OP_lit8, DW_OP_mul, DW_OP_plus, DW_OP_stack_value, DW_OP_piece 0x8

As you can see this expressions are the same but for adjacent addresses ranges. Why not use single expression for range 4FB5DF - 4FB60F?

DW_OP_mul just pops from stack couple of values and put back result of their multiplication (see evaluation logic in method execute_stack_op), so this sub-expression can be rewritten as just DW_OP_lit24

Also it`s curious to check what other compilers support subj:

gcc

Yes, with options -g -gdwarf-5 -fvar-tracking

golang

No - they even don`t support DWARF5 at all

openwatcom v2

No - judging by the funny comments

WHEN YOU FIGURE OUT WHAT THIS FILE DOES, PLEASE DESCRIBE IT HERE!

kssp library

$
0
0

I`ve tried to solve CSES task"visiting cities"

Looks like you can use kind of brute-force - get 1st shortest paths, then all remained with the same cost and make union of cities in each path - nor elegant nor smart algorithm, just to estimate if this approach works at all

I remember from my university course "graph theory" about Yen`s algo to get K-th shortest path so choosed to use some ready (and hopefully well-tested) implementation - kssp library from INRIA (yep - famous place where OCaml was invented)

And then happened real madness - different algos gave me different result and moreover - they didn't match with "correct" results from CSES! Lets see what I got (all results for test 7, compilation options -O3 -DTIME -DNDEBUG):

  • Yen - 470s, 30157 cities
  • node classification - 4.37s, 30140 cities
  • postponed node classification - 3.83s, 30140 cities
  • postponed node classification with star - 3.76s, 30140 cities
  • sidetrack based - consumed 13Gb of memory and met with OOM killer
  • parsimonious sidetrack based - OOM again, perhaps bcs not enough parsimonious :-)

Source code

kssp library, part 2

$
0
0

Previous part 

I was struck by the idea of how to reduce size of graph before enumerating all K-th shortest paths. We can use cut-points. By definition if we have cut-point in some shortest path it must be visited always - otherwise you can`t reach the destination. So algo is simple
  1. find with Dijkstra algo first shortest paths for whole graph
  2. find all cut-points in whole graph
  3. iterate over found shortest path - if current vertex is cut-point - we can run brute-force from previous cut-point till current

Results are crazy anyway - on the same test 7

  • Yen: 13.86s, 29787, 3501 cycles
  • Node Classification: 118.4s, 30013, 3501 cycles
  • Postponed Node Classification: 104.95s, 30013, 3501 cycles
  • PNC*: 120.33s, 30013, 3501 cycles
  • Parsimonious Sidetrack Based: 4.66s, 29980, 3501 cycles
  • Parsimonious Sidetrack Based v2: 4.79s, 29980, 3501 cycles
  • Parsimonious Sidetrack Based v3: 4.85s, 29980, 3501 cycles
  • Sidetrack Based: 4.18s, 29980, 3501 cycles
  • Sidetrack Based with update: 4.34s, 29980, 3501 cycles
At this time all algos worked to completion and again gave different results...
Source code

my solutions for couple CSES tasks

$
0
0

CSES has two very similar by description tasks but with completely different solutions: "Critical Cities" (218 accepted solutions at time when I writing this) and "Visiting Cities" (381 accepted solutions)

Critical Cities

We are given an directed unweighted graph and seems that we need to find it`s dominators for example using Lengauer-Tarjan algo (with complexity O((V+E)log(V+E))
Then we could check each vertex in this dominators tree to see if it leads to target node, so overall complexity is O(V * (V+E)log(V+E))
 
This looks not very impressive IMHO. Lets try something completely different (c) Monty Python's Flying Circus. For example we could run wave (also known as Lee algorithm) from source to target and get some path with complexity O(V+E). Note that in worst case this path can contain all vertices. Lets mark all vertices in this path
Next we could continue to run waves but at this time ignoring edges from marked nodes and see what marked vertices are still reachable. For example on some step k we run wave from Vs and reached vertices Vi and Vj. We can conclude that all vertices in early found path between Vs and Vj are NOT critical cities. So we can repeat next step starting with Vj
This process can be repeated in worst case V times so overall complexity is O(V*(V+E))
 
My solution is here

 

Visiting Cities

At this time we are given an directed weighted graph and seems that simplest solution is to find all K-th shortest paths (for example with Yen algo) and make union of their vertices. Because I'm very lazy I decided to reuse some ready and presumably well-tested implementation of this algo. You can read about fabulous resultshere
 

After that I plunged into long thoughts until I decided to count how many paths of minimal length go through each vertex - actually we could run Dijkstra in both directions: from source to target and from target to source, counting number of paths with minimal length. And then we could select from this path vertices where product of direct counts with reverse equal to direct count on target (or reverse count on source) - it`s pretty obvious that you can`t avoid such vertices in any shortest path. Complexity of this solution is two times from Dijkstra algo (depending from implementation O(V^2) or O(V * log(V) + E * log(V)) using some kind of heap) + in worst case V checks for each vertices in first found shortest path

My solution is here

Filling Trominos

$
0
0

IMHO this is very hard task - only 104 accepted solutions. My solution is here

Google gives lots of links for trominos but they all for totally different task from Euler Project - in our case we have only L-shapes. So lets think about possible algorithm

It`s pretty obvious that we can make 2 x 3 or 3 x 2 rectangles with couple of L-trominos. So naive solution is just to check if one size is divisible by 2 and other by 3

However with pen and paper you can quickly realize that you can for example fill rectangle 5 x 6:

aabaab
abbabb
ccddee
dcdced
ddccdd

Algo can look like (see function check2x3)

  • if one side of rectangle is divisible by 6 then another minus 2 should be divisible by 3
  • if one side of rectangle is divisible by 6 then another minus 3 should be divisible by 2

Submit our solution and from failed tests suddenly discovering that you also can have rectangle 9 x 5. Some details how this happens

So we can have maximal 3 groups of different shapes:

  • 9 x 5 rectangle (or even several if sides multiples of 5 & 9) - in my solution it stored in field has_95
  • 1 or 2 groups of 2 x 3 rectangles below 9 x 5 shape. 1 for case when you can fill this area with shapes 2 x 3 of the same orientation and 2 if you must mix vertical and horizontal rectangles - field trom
  • the same 1 or 2 groups on right of 9 x 5 shape - field right

Now the only remained problem is coloring

Rectangle 9 x 5 has 5 different colors but it is possible to arrange trominos in such way that on borders it will have only 4 colors and 5th is inside. For groups of 2 x 3 rectangles you need 4 colors if group size is 1 and yet 4 if size is 2. In worst case number of colors is 4 for 9 x 5 + 2 * 2 * 4 = 20 - so we can fit in A-Z

Architecture and Design of the Linux Storage Stack

$
0
0

Not perfect but suitable book considering the small number of books about linux internals. IMHO most useful is chapter 10, so below is brief summary of the presented tools

And I have stupid question - has anyone already merged all this zoo in some cmdlet/package for linux powershell to have common API? At least I was unable to find something similar on powershellgallery

Distinct Colors

$
0
0

I`ve solved yet another very funny CSES task - it looks very similar to another task called "Reachable Nodes" (my solution for it). The only difference is that we asked to count not unique nodes but colors of nodes. What can go wrong?

And this is where funny part begins - my patched solution got crashes. gdb didn`t showed nothing interesting. However I remember scary cryptic command to show stack usage:

print (char *)_environ - (char *)$sp
$1 = 8384904

Very close to default 8Mb (check ulimit -s). Wait, WHAT? Do we really have stack exhausting? Lets check - 8 * 1024 * 1024 = 8388608 bytes. Tree can have 200000 nodes. 8388608 / 200000 = ~42 bytes for each recursive DFS call. Seems to be true - in each call we store return address + stack frame RBP + 3 registers holding args (this, indexes of node and parent) - so at least 5 * 8 = 40 bytes. It`s so happened that some tests contain tree with very long stem from root till end, so yes - recursive DFS cannot visit all nodes in such tree. Solution is simple - we can emulate recursion with std::stack. As bonus for all nodes in stack we can use single bit mask to save space

Another unpleasant observation is that trees in tests ain't BINARY trees. When one picture is worth a thousand words:

Degree of node 2 is 4. This is main reason why function dfs has separate branch for processing joint nodes with only 2 descendants - bcs initially method is_fork returned only left and right

Source

failed attempts to draw graphs

$
0
0

CSES has several really hard graph-related tasks, for example

It would be a good idea to visualize those graphs. One of well-known tool to do this is Graphviz, so I wrote simple perl script to render graph from CSES plain text into their DSL. On small graphs all goes well and we can enjoy with something like

But seems that on big graphs with 200k nodes dot just can`t finish rendering and after ~2 hours of hard work met with OOM killer. Lets think how we can reduce size of graph

merging nodes with degree 2

Look at picture above. We can notice that vertex 5 has 2 edge: to v2 and v7 and can be replaced with just 1 edge between 2 & 7. This process can be repeated until no vertices with degree 2 remains. For directed graphs we can merge vertex when it has 1 in & 1 out edges

New picture (you can use option -c for my script) with merged nodes has only 19 edges vs 38 in old:

You can notice that now node 2 and 6 has single edge - nodes 1 & 8 was removed. Vertices with blue color are so-called cut-points and this lead us to

Condensation on Strongly Connected Components

For directed graphs can be done with Kosaraju's algorithm
For undirected graph we can just merge all nodes between cut-points. My script support -k option for such graph condensation:

Here for example box v4 contains vertices 4, 5, 7. And now see what happened with both -c & -k options:

Only 10 edges remained

Unfortunately this is anyway too much for graphviz - now it can draw 3 shape:

  1. very long straight line from nodes and lots of edges between them (using dot)
  2. black square of Malevich (using neato)
  3. black ball using circo

So it`s time to look for another tools for graph plotting. Like

R igraph package

It has convenient method read.csv, so I add option -r to my script to produce couple of CSV files:
  • nodes.csv & dedges.csv for directed graphs
  • nodes.csv & edges.csv for undirected graphs

Use simple R script for graph loading - it expect to see couple of this .csv files in current directory. You can install igraph package with command
install.packages("igraph")

Again result looks like hairball :-(

Viewing all 265 articles
Browse latest View live