Hi!
I’m writing a libbsd driver that resides in our RTEMs application code for handling TCP/IP communication that I need help to debug. As I’m new to RTEMS and libbsd any help would be greatly appreciated!
The application streams sensor data from the Polarfire Icicle kit, there is no reception besides TCP ACKs. After a few seconds / minutes of communication it suddenly stops.
First sign of trouble is that the send() function return with an error indicating that the send buffers are full. But nothing gets sent to the if_transmit() function in the driver. No other error messages are seen anywhere.
The if_tick() function keeps firing. I can see that the HW keeps receiving packages and tries to send them to the libbsd stack using netisr_queue(), but nothing seems to happen with them. After a while the netisr_queue() starts to return ENOBUFS error.
When the stack is stuck I can see that the “swi1: netisr 0” task has received an event, but it never seems to leave the EV state and go into READY. It looks fishy to me. But I don’t really know what to do with this information.
ID NAME SHED PRI STATE MODES EVENTS WAITINFO
------------------------------------------------------------------------------
0a010001 UI1 MEDF 1 EV P:T:nA NONE
0a010002 LOGT MEDF 110 MSG P:T:nA NONE 22010001
0a010003 TIME MEDF 98 SYSEV P:T:nA NONE
0a010004 IRQS MEDF 96 SYSEV P:T:nA NONE
0a010005 IRQS MEDF 96 SYSEV P:T:nA NONE
0a010006 IRQS MEDF 96 SYSEV P:T:nA NONE
0a010007 IRQS MEDF 96 SYSEV P:T:nA NONE
0a010008 _BSD inm_free taskq MEDF 100 WK P:T:nA NONE -
0a010009 _BSD in6m_free taskq MEDF 100 WK P:T:nA NONE -
0a01000a _BSD kqueue_ctx task MEDF 100 WK P:T:nA NONE -
0a01000b _BSD bus taskq MEDF 100 WK P:T:nA NONE -
0a01000c _BSD swi5: fast task MEDF 100 EV P:T:nA NONE
0a01000d _BSD thread taskq MEDF 100 WK P:T:nA NONE -
0a01000e _BSD swi6: Giant tas MEDF 100 EV P:T:nA NONE
0a01000f _BSD swi6: task queu MEDF 100 EV P:T:nA NONE
0a010010 _BSD deferred_unmoun MEDF 100 WK P:T:nA NONE -
0a010011 _BSD swi1: netisr 0 MEDF 100 EV P:T:nA 80000000
0a010012 _BSD bufdaemon MEDF 100 WK P:T:nA NONE psleep
0a010013 _BSD syncer MEDF 100 WK P:T:nA NONE syncer
0a010014 _BSD vnlru MEDF 100 WK P:T:nA NONE vlruwt
0a010015 _BSD bufspacedaemon- MEDF 100 WK P:T:nA NONE -
0a010016 PFRW MEDF 200 MSG P:T:nA NONE 22010006
0a010017 PFTW MEDF 100 EV P:T:nA NONE
0a010019 DHCP MEDF 2147483646 WK P:T:nA NONE select
0a01001a EST0 MEDF 100 TIME P:T:nA NONE
0a01001b SHPR MEDF 100 MSG P:T:nA NONE 2201000b
0a01001c TST0 MEDF 100 TIME P:T:nA NONE
0a01001d MST0 MEDF 100 MSG P:T:nA NONE 2201000c
0a01001e SHLL MEDF 100 READY P:T:nA NONE
The driver itself is pretty simple, it bridges the HW code provided by Microchip below with the libbsd networking stack.
The driver uses rtems_interrupt_handler_install() to provide the interrupt that the MSS Platform requires.
It has two RTEMs tasks, called PFRW and PRTW, for handling RX and TX respectively.
The TX uses three separate RTEMs queues for storing pointers to available, pending and done “packets”. These queues are called PFTA, PFTP and PFTD. TX worker wakes up on events, send packets from the PFTP queue and cleans up events from the PFTD queue and places them back into the PFTA queue. PFTD queue is filled from the ISR TX callback function from the MSS Platform code.
Rx side has a similar scheme but uses two queues called PFRA and PFRD.
Using the “queue” command from shell I can that PFTA and PFRA queues look healthy (just not doing anything) when the network stack is stuck.
ID NAME ATTRIBUTES PEND MAXPEND MAXSIZE
------------------------------------------------------------------------------
22010001 LOGQ DEFAULT 0 512 260
22010002 BMQS DEFAULT 390 400 8
22010003 BMQM DEFAULT 50 50 8
22010004 BMQL DEFAULT 9 20 8
22010005 PFRA DEFAULT 127 128 8
22010006 PFRP DEFAULT 0 128 8
22010007 PFTA DEFAULT 256 256 8
22010008 PFTP DEFAULT 0 256 8
22010009 PFTD DEFAULT 0 256 8
2201000a SHR0 DEFAULT 10 10 360
2201000b SHTX DEFAULT 0 10 360
2201000c MSR0 DEFAULT 0 4 64
I’m using libbsd release version 6.2. And RTEMS as below.
rtems all
RTEMS: 6.0.0 (f6933b9c6ff6780c1b3a56d80aa9577f181e5763) SMP:4 cores
CPU: RISCV (RISCV)
BSP: mpfs64imafdc
Tools: 13.3.0 20240521 (RTEMS 6, RSB 3814cb0e7f86cca2be403eac831f9bf571984659-modified, Newlib 1b3dcfd)
Options: SMP
SHLL [/] #
task all look normal as well
Uptime: 2m52.753999 Period: 0.713986
Tasks: 34 Load Average: 18.233% Load: 17.112% Idle: 382.889%
Mem: 88M free 136M used 944K stack
ID | NAME | RPRI | CPRI | TIME | TOTAL | CURRENT
------------+---------------------+---------------+---------------------+---------+--^^----
0x09010001 | IDLE | 2147483647 | 2147483647 | 2m52.719866 | 24.995 | 99.997
0x09010002 | IDLE | 2147483647 | 2147483647 | 2m52.548196 | 24.970 | 99.986
0x09010003 | IDLE | 2147483647 | 2147483647 | 2m51.622244 | 24.836 | 99.800
0x09010004 | IDLE | 2147483647 | 2147483647 | 2m22.614902 | 20.638 | 83.105
0x0a010001 | UI1 | 1 | 1 | 3.275615 | 0.474 | 0.000
0x0a010003 | TIME | 98 | 98 | 0.150780 | 0.021 | 0.085
0x0a01001d | MST0 | 100 | 100 | 19.678955 | 2.847 | 12.709
0x0a01001a | EST0 | 100 | 100 | 7.371226 | 1.066 | 3.423
0x0a01001b | SHPR | 100 | 100 | 0.068704 | 0.009 | 0.046
0x0a01001e | SHLL | 100 | 100 | 0.036017 | 0.005 | 0.252
0x0a010002 | LOGT | 110 | 110 | 0.029288 | 0.004 | 0.020
0x0a010004 | IRQS | 96 | 96 | 0.000039 | 0.000 | 0.000
0x0a010005 | IRQS | 96 | 96 | 0.000028 | 0.000 | 0.000
0x0a010006 | IRQS | 96 | 96 | 0.000027 | 0.000 | 0.000
0x0a010007 | IRQS | 96 | 96 | 0.000007 | 0.000 | 0.000
0x0a01001c | TST0 | 100 | 100 | 0.010689 | 0.001 | 0.006
0x0a010012 | _BSD | 100 | 100 | 0.005662 | 0.000 | 0.006
0x0a010014 | _BSD | 100 | 100 | 0.005576 | 0.000 | 0.004
0x0a010013 | _BSD | 100 | 100 | 0.004944 | 0.000 | 0.003
0x0a010015 | _BSD | 100 | 100 | 0.004848 | 0.000 | 0.004
0x0a010018 | CPlt | 100 | 100 | 0.004082 | 0.000 | 0.547
0x0a010008 | _BSD | 100 | 100 | 0.000017 | 0.000 | 0.000
0x0a010009 | _BSD | 100 | 100 | 0.000010 | 0.000 | 0.000
0x0a01000a | _BSD | 100 | 100 | 0.000009 | 0.000 | 0.000
0x0a01000b | _BSD | 100 | 100 | 0.000011 | 0.000 | 0.000
0x0a01000c | _BSD | 100 | 100 | 0.000012 | 0.000 | 0.000
0x0a01000d | _BSD | 100 | 100 | 0.001330 | 0.000 | 0.000
0x0a01000e | _BSD | 100 | 100 | 0.000009 | 0.000 | 0.000
0x0a01000f | _BSD | 100 | 100 | 0.000120 | 0.000 | 0.000
0x0a010010 | _BSD | 100 | 100 | 0.000010 | 0.000 | 0.000
0x0a010011 | _BSD | 100 | 100 | 0.265832 | 0.038 | 0.000
0x0a010016 | PFRW | 200 | 200 | 0.074205 | 0.010 | 0.002
0x0a010017 | PFTW | 100 | 100 | 0.503029 | 0.072 | 0.000
0x0a010019 | DHCP | 2147483646 | 2147483646 | 0.007700 | 0.001 | 0.000
The more data I try to send (larger / and or more packets) over the TCP/IP connection the faster it breaks down.
One final piece of information is that I’ve tried same code on RTEM7 (from 6 months ago since latest main don’t work at all with riscv/mpfs64imafdc for some reason) and I got the exact same error.
Thanks for reading and as I said above, any help debugging this would be greatly appreciated!