As expected results of auto-derived FSM for usermode dlls are much worse - for example on rpcrt4.dll can be found only 76 symbols from 228. It's because code in usermode contains much fewer unique constants (like NTSTATUS or allocation tags in kernel). So we need to use some additional data to make edges more distinguishable. Lets consider several candidates
load_config
Contains addresses of SecurityCookie and ptrs to GuardCFCheckFunctionPointer & GuardCFDispatchFunctionPointer. At least knowing SecurityCookie we can distinguish loading of some address in .data section from loading of cookies in prolog/epilogue of functions. But results are almost the same - 78 from 228
delayed import
New source of data missing in kernel mode. So I added new state to FSM - call_dimp, almost the same as call_imp but for delayed IAT. As expected results have grown - 109 from 228
constants in .rdata section
arm64 code can use not only ldr from constant pool but regular const data in .rdata section - for example strings for GetProcAddress etc. Lets see how looks such code:
ADRP X8, #aFeclientinitia@PAGE ; "FeClientInitialize"
ADD X1, X8, #aFeclientinitia@PAGEOFF ; "FeClientInitialize"
ADRP X9, #__imp_GetProcAddress@PAGE
ADD X8, X9, #__imp_GetProcAddress@PAGEOFF
LDAR X9, [X8]
BLR X9