Im running a kalman filter on the microblaze.
I can run it as either a stand alone application,
Or run it as a cfs application that is loaded via the dynamic loader.
When it runs as a CFS application the cpu use of my klaman filter task has 5x the cpu usage.
If I run it in stand alone mode teh cpu use of my kalman filter is 5x faster.
I suspect this is due to the dynamic library.
A coworker was looking at cache hit stats.
It did not appear to be a measurable difference between the two builds when looking at performance counters.
https://docs.amd.com/r/en-US/ug984-vivado-microblaze-ref/Performance-Monitoring
Any other recommendations on Event Counter Events I should look at?
Any best next steps to try would be appreciated.
I had ai look at the rtems dynamic linker code for the microblaze.
• Most likely this is not an RTEMS tasking issue, but a MicroBlaze libdl codegen/ABI issue.
The strongest reason is that RTEMS’s MicroBlaze runtime-linker backend is still very minimal. In cpukit/libdl/rtl-mdreloc-microblaze.c:42 the arch-specific
section handling is basically a stub, .sdata/.sbss handling is absent, trampolines are disabled at cpukit/libdl/rtl-mdreloc-microblaze.c:83, and only a subset
of relocations is implemented. By contrast, PowerPC has explicit small-data support in cpukit/libdl/rtl-mdreloc-powerpc.c:64 through cpukit/libdl/rtl-mdreloc-
powerpc.c:182. That points to MicroBlaze loaded modules missing ABI optimizations that static links can use.
The second likely reason is linker relaxation. RTEMS libdl loads relocatable ELF objects, not shared libraries: cpukit/libdl/rtl-elf.c:1597 rejects ET_DYN and
cpukit/libdl/rtl-elf.c:1602 rejects program headers. So a dynamically loaded module does not go through a final linker relaxation pass the way a statically
linked image does. On MicroBlaze that matters a lot, because the backend itself notes R_MICROBLAZE_32_NONE is “only used for relaxing” in cpukit/libdl/rtl-
mdreloc-microblaze.c:227. My inference is that loaded code keeps conservative multi-instruction address/call sequences that the static final link can shrink,
and on MicroBlaze that can easily become a large slowdown if the task is global-access-heavy or call-heavy.
I do not think instruction-cache coherency is the primary cause. dlopen() does synchronize cache after load in cpukit/libdl/rtl.c:593 and cpukit/libdl/rtl-
obj.c:755.
A related warning sign is that the BSP startup still has the small-data anchor setup commented out in bsps/microblaze/shared/start/start.S:97. I would treat
that as supporting evidence that MicroBlaze small-data/GP handling is not fully wired for the dynamic-load path.
If you want, I can do the next step and make this concrete by comparing the static and loadable builds with microblaze-*-objdump, looking specifically for:
- many imm + addik/lwi/swi sequences in the loaded build,
- R_MICROBLAZE_32_NONE and related relax-only relocations,
- .sdata/.sbss usage or forced -G0 style codegen.