lets see how PIC looks like for sw64 on the example of a function from libLLVM-7.so.1 (huge shared library - size 45Mb):
1000ED0 ldih GP, PV, 0x1D3
PV almost always contains address of called function so value of GP now 2D30ED0
1000ED4 ldi GP, GP, -0x1290
value of GP now 2D30ED0 - 1290 = 2D2FC40. I expected that this base address always located inside .got but this is not true - it can lie anywhere, sometimes even not inside elf module! All remaining refs use this base address in GP register:
1000ED8 ldih PV, GP, 0
1000EDC ldl PV, PV, -0x4EC0
wait, WHAT? they use return address in RA to fill GP with the same value 2D2FC40. and even worse - they restore value of GP even in epilogue where it is not used
...
1000F14 call RA, PV, 0
1000F18 ldih GP, RA, 0x1D3 ; 2D30F18
1000F20 ldi GP, GP, -0x12D8 ; 2D2FC40
wait, WHAT? they use return address in RA to fill GP with the same value 2D2FC40. and even worse - they restore value of GP even in epilogue where it is not used
Lets estimate size overhead. libLLVM-7.so.1 has 41337 functions, 8432116 instructions and 781997 to set value of GP. rate 781997 / 8432116 = 0.092740
Lets assume that each function anyway need to setup GP, so required number of instructions is 41337 * 2 = 82674. remaining is 781997 - 82674 = 699323
remove unneeded GP setups from epilogues: 699323 - 82674 = 616649
this amount easy can be reduced in half - just store calculated value of GP in stack with stl gp, sp, offset (+41337 instructions) and then pop it when needed with ldl gp, sp, offset
So actual amount of instructions could be 616649 / 2 + 41337 + 82674 = 432336
new rate: 432336 / 8432116 = 0.05127
overhead is 0.092740 - 0.05127 = 4.1%
cool, almost 2Mb of code is just unnecessary