Unknown error while running tests using qemu

D3athSkulll · 8 April 2026 04:10

Same issue for the changes in the MR and for the main branch both.

Qemu doesnt seem to execute tests properly and hangs on the terminal.
For checking whether qemu runs. The log file was created.
rv32imafdc runs on virt machine configuration.
no graphic is used to display output on terminal itself.
On running test in rtems-test, the test face a timeout issue.

Resultant qemu.log

On adding -d in_asm,cpu , makes the qemu.log file be constantly incrementing in size to a eventual memory shortage.

Meanwhile, the original command seemed to have run easily some days prior.

I have tried reinstalling qemu completely but still face the same issue.

D3athSkulll · 9 April 2026 04:13

On inspecting the code in qemu.log while running with asm,cpu config.
There is this code which gets executed in an infinite loop.
This points out to a linker issue.

V      =   0
 pc       800033e0
 mhartid  00000000
 mstatus  80007800
 mstatush 00000000
 hstatus  00000000
 vsstatus 00000000
 mip      00000080
 mie      00000000
 mideleg  00001444
 hideleg  00000000
 medeleg  00000000
 hedeleg  00000000
 mtvec    800033e0
 stvec    00000000
 vstvec   00000000
 mepc     800033e2
 sepc     00000000
 vsepc    00000000
 mcause   00000007
 scause   00000000
 vscause  00000000
 mtval    ff46fdf0
 stval    00000000
 htval    00000000
 mtval2   00000000
 mscratch 00000000
 sscratch 00000000
 satp     00000000
 x0/zero  00000000 x1/ra    80000668 x2/sp    ff46fd00 x3/gp    80053800
 x4/tp    00000000 x5/t0    83fffff8 x6/t1    800530d4 x7/t2    8000ca18
 x8/s0    8000fca1 x9/s1    80009862 x10/a0   0000000d x11/a1   00000000
 x12/a2   8000fca0 x13/a3   0000000a x14/a4   8000fc90 x15/a5   8000fca0
 x16/a6   00000000 x17/a7   00000004 x18/s2   7fff0360 x19/s3   00000000
 x20/s4   8000fca0 x21/s5   00000000 x22/s6   00000025 x23/s7   00000000
 x24/s8   00000000 x25/s9   8000feec x26/s10  8000fca1 x27/s11  00000000
 x28/t3   00000010 x29/t4   84000000 x30/t5   80057e37 x31/t6   00000000

gedare · 9 April 2026 15:47

What is your config.ini entry for the build?

gedare · 9 April 2026 15:54

Where did you get this qemu command line?

D3athSkulll · 9 April 2026 15:59

[DEFAULT]
RTEMS_POSIX_API = True
BUILD_TESTS = True
[riscv/rv32imafdc]

This is the config.ini used.

I got the asm configuration command for qemu through Chatgpt.

gedare · 9 April 2026 16:47

ok, The qemu command line to use is in our documentation, see

The one you used probably does not load the OpenSBI firmware, which may cause some issues. However, it should still work.

The next step would be to either use git-bisect to try to identify which commit the test stopped working on, or use gdb to find out where in the program it is hanging.

gedare · 9 April 2026 16:58

The register dump suggests this is inside the timer interrupt handler, since mcause = 7. You can use riscv-rtems7-objdump -d spclock_err02.exe > a.txt and inspect the PC and the MEPC addresses to see what is being executed. The fact that the PC and MEPC are nearly identical is suspicious. I would guess that there’s an interrupt happening while execution is handling the previous interrupt.

gedare · 9 April 2026 17:27

I was able to reproduce this behavior, and I have bisected this to:

2c16558469b6356d11b88a05030397efb37e72af is the first bad commit
commit 2c16558469b6356d11b88a05030397efb37e72af
Author: Gedare Bloom <gedare@rtems.org>
Date:   Tue Mar 3 14:40:03 2026 -0700

    riscv/riscv: s-mode booting with SMP
    
    Closes #3337

 bsps/riscv/riscv/start/bspsmp.c |  40 ++++++++--
 bsps/riscv/shared/start/start.S | 162 +++++++++++++++++++++++++++-------------
 2 files changed, 144 insertions(+), 58 deletions(-)

Feel free to open an issue for this bug.

gedare · 9 April 2026 17:48

I also traced in gdb, and execution winds up in a tight loop on:

(gdb) stepi
63		j .LRISCV_Exception_handler
(gdb) stepi
67		addi	sp, sp, -CPU_INTERRUPT_FRAME_SIZE
(gdb) 
_RISCV_Vector_table ()
    at ../../../cpukit/score/cpu/riscv/riscv-exception-handler.S:70
70		SREG	a0, RISCV_INTERRUPT_FRAME_A0(sp)
(gdb) 
63		j .LRISCV_Exception_handler

So this seems quite like a problem of spurious interrupt, probably the timer configuration is wrong in 32-bit risc-v after the bad commit.

D3athSkulll · 9 April 2026 17:57

I faced this issue on rv64imafdc and sparc/erc32 and sparc/leon3 also

gedare · 9 April 2026 18:44

That’s curious, please double check. I don’t see this problem on rv64imafdc, and based on the bad commit, I would not expect any problems with sparc that is exactly this problem.

D3athSkulll · 9 April 2026 19:30

My bad, I have tested again on rv64imafdc I got proper output.
I have updated the spclock MR regarding the same.
I’ll create a new issue for this problem side by side

D3athSkulll · 13 April 2026 20:04

gedare:

I also traced in gdb, and execution winds up in a tight loop on:
(gdb) stepi
63		j .LRISCV_Exception_handler
(gdb) stepi
67		addi	sp, sp, -CPU_INTERRUPT_FRAME_SIZE
(gdb) 
_RISCV_Vector_table ()
    at ../../../cpukit/score/cpu/riscv/riscv-exception-handler.S:70
70		SREG	a0, RISCV_INTERRUPT_FRAME_A0(sp)
(gdb) 
63		j .LRISCV_Exception_handler
So this seems quite like a problem of spurious interrupt, probably the timer configuration is wrong in 32-bit risc-v after the bad commit.

There seems to be this issue when running with leon3

gedare · 13 April 2026 20:06

I think it’s a different problem.

D3athSkulll · 13 April 2026 20:22

is this regarding leon3 listed as an issue on the gitlab?

Regarding the rv32 and the faulty commit.
I tried analysing and making some changes in start.S and bspsmp.c and clockdrv.c which were the files involved in this issue and here were my findings

Commented out early interrupt enabling in bspsmp.c to check if premature interrupts caused the issue. The hang persisted, ruling out simple interrupt timing as the root cause.

Reverted exception routing from direct handler to _RISCV_Vector_table. This ensured proper dispatching but did not resolve the looping issue.

Disabled timer interrupt initialization to test if a timer storm was causing repeated traps. The system still hung, proving the issue was not timer-related.

Adjusted stack pointer initialization from _begin to _end to avoid invalid memory access. No improvement was observed, ruling out basic stack misplacement.

Hypothesized that uninitialized memory could cause faults and tested reordering BSS clearing. The issue persisted, so memory initialization order was not the cause.

Can you suggest whats our best course of action here?

gedare · 27 April 2026 20:02

I opened an issue and submitted a fix: riscv: use unsigned comparison for address check (!1221) · Merge requests · RTEMS / RTOS / RTEMS · GitLab

I narrowed down the problem by breaking at the first dispatch into _RISCV_Vector_table and backtraced to find there was a fatal error due to missing FDT, which pointed me to look at the bsp_fdt_copy and I found that execution was not reaching it during the boot process.

D3athSkulll · 27 April 2026 22:41

Should I remove my issue which I opened?
In regards to this