It seems that Chinese are hiding information about their another homemade processor sw64 - try to find some technical details with google, baidu or gitee. At the same time they ported linux on this processor - and you even can find some details in openEuler project. I think this conspiracy is very funny and at least violating licenses for binutils/clang/gcc etc
Anyway lets see if we can reverse ISA for sw64 having only linux image and some source code from linux kernel (spoiler: also write processor module for ida pro)
registers
try to compare registers of sw64 with Alpha AXP - can you find any difference? at least we now know that processor has 32 general purpose registers and 32 for floating point, so fields for register encoding must be 5 bits
ELF relocs
relocs can be extracted from arch/sw_64/include/asm/elf.h. So the next thing which I wrote was small ida pro plugin to apply this relocs - nothing special, actually it was almost exactly copy of the same plugin for LoongArch
mnemonics
So where we can get mnemonics? They are usually stored somewhere inside binutils, so I just put libopcodes-2.31.1-system.so in ida pro and dumped operands table with simple idc script
Also I compared how many mnemonic names matched with Alpha processor. we have total 383 names:
- ida pro knows 74 in its processor module for Alpha
- binutils knows 84 in opcodes/alpha-opc.c
not very big intersection, so we must employ some reversing technics. Table with opcodes (slightly edited to remove two fields with only zeros) looks like
lldw 20000000 FC00F000 800 A2701
- name of instruction
- matching value after AND with mask
- mask
- don`t know what is it - this field contains only 2 value - 800 & 2000, so perhaps this is family or assembler option
- most interesting field - schema for operands encoding
so we can write some perl script and quickly realize that we have lots of duplicates. Obviously this is some pseudo instructions whose name changes depending on value of operands. Some example:
nop 40000740 90807
clr 40000740 30807
mov 40000740 30207
or 40000740 F0201
operands decoding
Lets look at arch/sw_64/netbpf_jit.h. We can understand that field for opcode is 6 highest bits, then for opcode SW64_BPF_OPCODE_ALU_REG we have 3 register (ra, rb and rc) and for SW64_BPF_OPCODE_ALU_IMM we have 2 register (ra & rc) and imm value, Bit offsets for register fields:
- ra 21
- rb 16
- rc 0
now we only need to understand where this operands live in 5th field from our table - actually there is 3! = 6 combinations but seems that order is ra, rb, rc:
A B C
nop 40000740 90807
clr 40000740 30807
mov 40000740 30207
or 40000740 F0201
we can conclude that 7 means zero rc, 8 zero rb and 9 zero ra (also 6 for floating point register ra, 0xc for floating point register rb), so nop is "or zr, zr, zr", clr is "or ra, zr, zr" and mov is "or ra, rb, zr"
now having all this info we can quickly (he-he, it took 3 days actually) write processor module for ida pro and then look at actual code in itmemory xrefs
and be disappointed bcs from function emit_sw64_ldu64 I expected to see something like long sequence for loading of 64bit address and then call - something like
ldi reg, imm1
sll reg, 60
ldi reg, imm2
sll reg, 45
ldi reg, imm3
sll reg, 30
ldih GP, PV, 2
ldi GP, GP, 0x2418
ldih PV, GP, 0
ldl PV, PV, -0x7BD8
call RA, PV, 0
I naive assumed that this is position independent code and pair of instructions ldih/ldi just load address of GOT and next second pair of ldih/ldl load address from it. I don`t know if this is right for all cases but code contains also lots of weird sequences like this:
call RA, PV, 0 ; call with address in PV, store return address in RA
ldih GP, RA, 2 ; load some offset relative RA. weird
ldi GP, GP, 0x23D4
ldih PV, GP, 0
ldl PV, PV, -0x7EF0
in this case value of GP assigned based on return address - for call this will be address of ldih instruction. I am too lazy to check if GP assigned different value each time, seems that just using of GOT address works fine for small modules where all data lie within a 15bit offset. but you can patch my function emu_insn to track values of PV, then GP and then find call to get address in RA