Should printf on a multi-processor system just work?

I’ve compiled the example application from the documentation (for rv32imafc) and run it under QEMU with 4 cores (-smp 4). This works fine and dandy for the example app, where the application task simply prints hello world once a second.
However, when I create the same application task ten times, provide ample stack, give each task a different priority, the app reliably crashes after a few seconds of printing with garbled output (“hello,����!P����”) and QEMU taking up host CPU as if in three busy loops.
The same happens when I replace printf with write(1, “hello world\n”, 13).
I’m not sure if I try something that is not expected to work at all (maybe the stdio layer needs mutexes?) Can anyone shed some light on this? I’m happy to provide the whole program source.

It shouldn’t crash, but I would absolutely expect garbled output since printf() has no locking or serialization built in. Can you generate a backtrace by connecting GDB to QEMU?

Ok, I have started qemu with “-s -S” and connected with gdb. This is the result of the backtraces for each thread:

(gdb) break Init
Breakpoint 1 at 0x800001a2
(gdb) run
The "remote" target does not support "run".  Try "help target" or "continue".
(gdb) continue
Continuing.
[Switching to Thread 1.4]

Thread 4 hit Breakpoint 1, 0x800001a2 in Init ()
(gdb)
Continuing.
^C
Thread 4 received signal SIGINT, Interrupt.
_CPU_Thread_Idle_body (ignored=0) at ../../../cpukit/score/cpu/riscv/cpu.c:195
195
(gdb) info threads
  Id   Target Id                    Frame
  1    Thread 1.1 (CPU#0 [running]) 0x8000067c in riscv_console_set_reg_8 (addr=<optimized out>, i=0 '\000', val=0 '\000')
    at ../../../bsps/riscv/riscv/console/console-config.c:123
  2    Thread 1.2 (CPU#1 [running]) _CPU_Context_restore () at ../../../cpukit/score/cpu/riscv/riscv-context-switch.S:158
  3    Thread 1.3 (CPU#2 [halted ]) _CPU_Thread_Idle_body (ignored=0) at ../../../cpukit/score/cpu/riscv/cpu.c:195
* 4    Thread 1.4 (CPU#3 [halted ]) _CPU_Thread_Idle_body (ignored=0) at ../../../cpukit/score/cpu/riscv/cpu.c:195
(gdb) bt
#0  _CPU_Thread_Idle_body (ignored=0) at ../../../cpukit/score/cpu/riscv/cpu.c:195
#1  0x8000c9e6 in _Thread_Handler () at ../../../cpukit/score/src/threadhandler.c:164
#2  0x8000c93a in _Thread_Get (id=<optimized out>, lock_context=0x80035408) at ../../../cpukit/score/src/threadget.c:63
Backtrace stopped: frame did not save the PC
(gdb) thread 1
[Switching to thread 1 (Thread 1.1)]
#0  0x8000067c in riscv_console_set_reg_8 (addr=<optimized out>, i=0 '\000', val=0 '\000')
    at ../../../bsps/riscv/riscv/console/console-config.c:123
123     }
(gdb) bt
#0  0x8000067c in riscv_console_set_reg_8 (addr=<optimized out>, i=0 '\000', val=0 '\000')
    at ../../../bsps/riscv/riscv/console/console-config.c:123
#1  0x80001f80 in ns16550_write_to_fifo (ctx=0x80029c84 <ns16550_instances>, len=<optimized out>, buf=<optimized out>)
    at ../../../bsps/shared/dev/serial/ns16550-context.c:250
#2  ns16550_isr (arg=0x80145e90) at ../../../bsps/shared/dev/serial/ns16550-context.c:292
#3  0x80000e66 in bsp_interrupt_dispatch_entries (entry=0x80146290) at ../../../bsps/include/bsp/irq-generic.h:571
#4  bsp_interrupt_handler_dispatch_unchecked (vector=12) at ../../../bsps/include/bsp/irq-generic.h:627
#5  bsp_interrupt_handler_dispatch (vector=12) at ../../../bsps/include/bsp/irq-generic.h:654
#6  _RISCV_Interrupt_dispatch (mcause=<optimized out>, cpu_self=<optimized out>) at ../../../bsps/riscv/riscv/irq/irq.c:90
#7  0x8001071e in _RISCV_Exception_handler () at ../../../cpukit/score/cpu/riscv/riscv-exception-handler.S:143
Backtrace stopped: frame did not save the PC
(gdb) thread 2
[Switching to thread 2 (Thread 1.2)]
#0  _CPU_Context_restore () at ../../../cpukit/score/cpu/riscv/riscv-context-switch.S:158
158             lw      a3, RISCV_CONTEXT_IS_EXECUTING(a1)
(gdb) bt
#0  _CPU_Context_restore () at ../../../cpukit/score/cpu/riscv/riscv-context-switch.S:158
#1  0x8000c61a in _Thread_Do_dispatch (cpu_self=0x80029480 <_Per_CPU_Information+512>, level=3)
    at ../../../cpukit/score/src/threaddispatch.c:329
#2  0x00000000 in ?? ()
Backtrace stopped: frame did not save the PC
(gdb) thread 3
[Switching to thread 3 (Thread 1.3)]
#0  _CPU_Thread_Idle_body (ignored=0) at ../../../cpukit/score/cpu/riscv/cpu.c:195
195
(gdb) bt
#0  _CPU_Thread_Idle_body (ignored=0) at ../../../cpukit/score/cpu/riscv/cpu.c:195
#1  0x8000c9e6 in _Thread_Handler () at ../../../cpukit/score/src/threadhandler.c:164
#2  0x8000c93a in _Thread_Get (id=<optimized out>, lock_context=0x800343a8) at ../../../cpukit/score/src/threadget.c:63
Backtrace stopped: frame did not save the PC
(gdb) thread 4
[Switching to thread 4 (Thread 1.4)]
#0  _CPU_Thread_Idle_body (ignored=0) at ../../../cpukit/score/cpu/riscv/cpu.c:195
195
(gdb) bt
#0  _CPU_Thread_Idle_body (ignored=0) at ../../../cpukit/score/cpu/riscv/cpu.c:195
#1  0x8000c9e6 in _Thread_Handler () at ../../../cpukit/score/src/threadhandler.c:164
#2  0x8000c93a in _Thread_Get (id=<optimized out>, lock_context=0x80035408) at ../../../cpukit/score/src/threadget.c:63
Backtrace stopped: frame did not save the PC
(gdb)

And here is the complete application along with info how to compile and run:

/*
 * HOW TO COMPILE:
 * $ riscv-rtems6-gcc -march=rv32imafc -mabi=ilp32f \
 *      -B $PREFIX/riscv-rtems6/rv32imafc/lib -qrtems \
 *      -o application.elf application.c
 * HOW TO RUN:
 * $ qemu-system-riscv32 -smp 4 -m 196M -nographic -machine virt \
 *      -bios none -kernel application.elf
 *
 * PS: This was my toolchain's config.ini for waf:
 * [riscv/rv32imafc]
 * RTEMS_POSIX_API = True
 * BSP_CONSOLE_BAUD = 115200
 * RISCV_RAM_REGION_BEGIN = 0x80000000
 * RISCV_RAM_REGION_SIZE  = 0x08000000  # 128MB
 * BSP_FDT_BLOB_SIZE_MAX = 16384
 * RISCV_ENABLE_HTIF_SUPPORT = True
 * RISCV_BOOT_HARTID = 0
 * RTEMS_SMP = True
 */

#include <rtems.h>
#include <stdio.h>
#include <stdlib.h>
#ifndef RTEMS_SMP
#error "RTEMS not compiled with RTEMS_SMP"
#endif

#define TASK_COUNT 10
rtems_id gTid[TASK_COUNT] = { 0 };

rtems_task user_application(rtems_task_argument aArgument);
rtems_task Init(rtems_task_argument ignored);

rtems_task Init(rtems_task_argument ignored) {
    rtems_status_code status;
    for (int t = 0; t < TASK_COUNT; ++t) {
        /* *INDENT-OFF* */
        status = rtems_task_create(
            rtems_build_name('T', 's', 'k', 'A' + t),
            1 + t,
            RTEMS_MINIMUM_STACK_SIZE * 16,
            RTEMS_DEFAULT_MODES,
            RTEMS_FLOATING_POINT | RTEMS_GLOBAL,
            &gTid[t]);
        /* *INDENT-ON* */ 
        if (status != RTEMS_SUCCESSFUL) {
            printf("rtems_task_create failed with status of %d.\n", status);
            exit(1);
        }
        status = rtems_task_start(gTid[t], user_application, t);
        if (status != RTEMS_SUCCESSFUL) {
            printf("rtems_task_start failed with status of %d.\n", status);
            exit(2);
        }
    }
    status = rtems_task_delete(rtems_task_self());  /* should not return */
    printf("rtems_task_delete returned with status of %d.\n", status);
    exit(3);
}

rtems_task user_application(rtems_task_argument aArgument) {
    for (;;) {
        printf("hello, world from task %c\n", 'A' + aArgument);
        rtems_task_wake_after(100);
    }
}

#define CONFIGURE_APPLICATION_NEEDS_CONSOLE_DRIVER
#define CONFIGURE_APPLICATION_NEEDS_CLOCK_DRIVER
#define CONFIGURE_INIT_TASK_NAME rtems_build_name( 'I', 'n', 'i', 't' )
#define CONFIGURE_RTEMS_INIT_TASKS_TABLE
#define CONFIGURE_INIT
#define CONFIGURE_INIT_TASK_STACK_SIZE 65536
#define CONFIGURE_INIT_TASK_ATTRIBUTES RTEMS_FLOATING_POINT
#define CONFIGURE_MICROSECONDS_PER_TICK 1000
#define CONFIGURE_MAXIMUM_PROCESSORS 4
#define CONFIGURE_MAXIMUM_TASKS (1 + CONFIGURE_MAXIMUM_PROCESSORS + TASK_COUNT)
#define CONFIGURE_EXTRA_TASK_STACKS (CONFIGURE_MAXIMUM_TASKS * RTEMS_MINIMUM_STACK_SIZE * 16)
#define CONFIGURE_MP_MAXIMUM_GLOBAL_OBJECTS (5 * CONFIGURE_MAXIMUM_TASKS)
#include <rtems/confdefs.h>

Can you reproduce the crash? I’m using QEMU emulator version 10.1.91 on FreeBSD. It only takes a few seconds to see this and stop:

[...]
hello, world from task C
hello, world from task F
hello, world from task D
hello, world from task G
hello, world from task H
hello, world from task E
hello, world from task I
hello, world from task J
hello, world from task C
hello, world from task A
hello, world from task B
hello, world from task G
hello, world from task F
hllo, ask H
rom task G
hello, world����!��^�T�hello, world from task H
A��
   ����

Yes it should. You can abuse stdout so the output can become a mix of multiple threads. These standard files are a sequential interface so you need to add a lock (mutex) to make the access concurrent.

But it doesn’t. I can wrap the printfs in a mutex, or even in flockfile/funlockfile, it crashes as described. What works is to define my own uprintf() with vsnprintf() and then outputting the result string character by character to the UART base 0x10000000, e.g.

int uprintf(const char *aFormat, ...) {
    char    buffer[256];
    va_list args;
    int     n, i;
    rtems_status_code status;

    va_start(args, aFormat);
    n = vsnprintf(buffer, sizeof(buffer) - 1, aFormat, args);
    va_end(args);

    status = rtems_semaphore_obtain(gUartMutex, RTEMS_WAIT, RTEMS_NO_TIMEOUT);
    if (status != RTEMS_SUCCESSFUL) {
        exit(41);
    }
    for (i = 0; i < n; i++) {
        *(char *) 0x10000000 = buffer[i];
    }
    status = rtems_semaphore_release(gUartMutex);
    if (status != RTEMS_SUCCESSFUL) {
        exit(42);
    }
    return n;
}

It looks to me like newlib stdio is simply not re-entrant (and I think it would be a big ask for it to be).

Newlib is re-enterant.

You cannot hit the UART that way.

Interestingly, replacing printf with printk does not have the problem–I get the expected interspersed output from multiple task printing. My uprintf with locks even avoids the interspersed output, and under QEMU does not even test if the UART TX buffer is ready (QEMU’s virtual UART is as fast as I write, I’m told). Newlib printf from many tasks crashes though, even when serialized with locking and I want to understand the exact problem. If it is not reentrancy, then what is the failure mechanism?

Understanding the reason would require a detail examination of the crash and the system at the time. This forum may not be that efficient.

If you think there could be a bug please make a test case that is as small as possible, create an issue and add the test code including what RTEMS version and tools you used, how to build the test and then how to run it.