Disclaimer: proposed approach uses dirty hacks & patches and tested on x86_64 only so use it at your own risk. Also no chatGPT or some another Artificial Idiots were used for this research
Lets assume that we have shared library (for example R extension or python module) and we want to know where and why it spending many hours and consuming megawatts of electricity. There is even semi-official way to do this:
- compile shared library with -pg option
- set envvar LD_PROFILE_OUTPUT to directory where you want to store profiling data
- set envvar LD_PROFILE to filename of library to profile
- run your program. Well, sounds that you need lots of things to do before this step and you can`t set up profiling dynamically
- run sprof on profiling log
Unfortunately this method just don`t work - sprof fails with cryptic message Inconsistency detected by ld.so: dl-open.c: 890: _dl_open: Assertion `_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT' failed!
gcc -pg
- it extracts return address from stack
- and so cannot be called as ordinary C function - you need to add asm thunk like "jmp [mcount_ptr]" in your wrapper
void monstartup (char *low_pc, char *high_pc)
It called to start profiling data and all mcount calls must be located within range low_pc .. high_pc
void _mcleanup (void)
gcc -finstrument-functions
to indicate that function entry and exit should be instrumented with calls to support routines
void __cyg_profile_func_enter(void *this_fn, void *call_site)
void __cyg_profile_func_exit(void *this_fn, void *call_site)
making API for profiling
- check if desired shared library can be profiled (e.q. has mcount or __cyg_profile_func_enter in GOT) - I used ELFIO but perhaps this can be done with LIEF too
- patch GOT
- find addresses of loaded shared llibrary - this can be done with dl_iterate_phdr
- call monstartup to start profiling
- run your code from profiling library
- call mcleanup to stop profiling and store results
gprof path2your_shared_library gmon.out
and see sad results - gprof refusing to show results. This is bcs it don`t know at which address your shared library was loaded (and worse - this address will differs each run due to ASLR). So lets
patch gprof
It sounds scary but in reality the most difficult thing was to find not used letter in getopts. I chose -U to pass image base and subtracting passed value in function gmon_io_read_vma. Patch
Because now you also need to know base address for gprof I just add it to name of gmon.out - via setting GMON_OUT_PREFIX. Run patched gprof:
gprof -b -U 7F9E01224000 ./libprelf.so gmon.7F9E01224000.3926320
Flat profile:
Each sample counts as 0.01 seconds.
no time accumulated
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
0.00 0.00 0.00 12552 0.00 0.00 ELFIO::endianess_convertor::operator()(unsigned long) const
0.00 0.00 0.00 4608 0.00 0.00 ELFIO::section_impl<ELFIO::Elf64_Shdr>::get_size() const
0.00 0.00 0.00 4398 0.00 0.00 ELFIO::section_impl<ELFIO::Elf64_Shdr>::get_entry_size() const
0.00 0.00 0.00 2963 0.00 0.00 ELFIO::endianess_convertor::operator()(unsigned int) const
0.00 0.00 0.00 2935 0.00 0.00 ELFIO::section_impl<ELFIO::Elf64_Shdr>::get_data() const
...