Microchip Polarfire SoC FPGA libbsd driver fails due to not handling swi event?

Hi!

I’m writing a libbsd driver that resides in our RTEMs application code for handling TCP/IP communication that I need help to debug. As I’m new to RTEMS and libbsd any help would be greatly appreciated!

The application streams sensor data from the Polarfire Icicle kit, there is no reception besides TCP ACKs. After a few seconds / minutes of communication it suddenly stops.

First sign of trouble is that the send() function return with an error indicating that the send buffers are full. But nothing gets sent to the if_transmit() function in the driver. No other error messages are seen anywhere.

The if_tick() function keeps firing. I can see that the HW keeps receiving packages and tries to send them to the libbsd stack using netisr_queue(), but nothing seems to happen with them. After a while the netisr_queue() starts to return ENOBUFS error.

When the stack is stuck I can see that the “swi1: netisr 0” task has received an event, but it never seems to leave the EV state and go into READY. It looks fishy to me. But I don’t really know what to do with this information.

ID       NAME                 SHED PRI STATE  MODES    EVENTS WAITINFO
------------------------------------------------------------------------------
0a010001 UI1                  MEDF   1 EV     P:T:nA   NONE
0a010002 LOGT                 MEDF 110 MSG    P:T:nA   NONE   22010001
0a010003 TIME                 MEDF  98 SYSEV  P:T:nA   NONE
0a010004 IRQS                 MEDF  96 SYSEV  P:T:nA   NONE
0a010005 IRQS                 MEDF  96 SYSEV  P:T:nA   NONE
0a010006 IRQS                 MEDF  96 SYSEV  P:T:nA   NONE
0a010007 IRQS                 MEDF  96 SYSEV  P:T:nA   NONE
0a010008 _BSD inm_free taskq  MEDF 100 WK     P:T:nA   NONE   -
0a010009 _BSD in6m_free taskq MEDF 100 WK     P:T:nA   NONE   -
0a01000a _BSD kqueue_ctx task MEDF 100 WK     P:T:nA   NONE   -
0a01000b _BSD bus taskq       MEDF 100 WK     P:T:nA   NONE   -
0a01000c _BSD swi5: fast task MEDF 100 EV     P:T:nA   NONE
0a01000d _BSD thread taskq    MEDF 100 WK     P:T:nA   NONE   -
0a01000e _BSD swi6: Giant tas MEDF 100 EV     P:T:nA   NONE
0a01000f _BSD swi6: task queu MEDF 100 EV     P:T:nA   NONE
0a010010 _BSD deferred_unmoun MEDF 100 WK     P:T:nA   NONE   -
0a010011 _BSD swi1: netisr 0  MEDF 100 EV     P:T:nA 80000000
0a010012 _BSD bufdaemon       MEDF 100 WK     P:T:nA   NONE   psleep
0a010013 _BSD syncer          MEDF 100 WK     P:T:nA   NONE   syncer
0a010014 _BSD vnlru           MEDF 100 WK     P:T:nA   NONE   vlruwt
0a010015 _BSD bufspacedaemon- MEDF 100 WK     P:T:nA   NONE   -
0a010016 PFRW                 MEDF 200 MSG    P:T:nA   NONE   22010006
0a010017 PFTW                 MEDF 100 EV     P:T:nA   NONE
0a010019 DHCP                 MEDF 2147483646 WK P:T:nA   NONE   select
0a01001a EST0                 MEDF 100 TIME   P:T:nA   NONE
0a01001b SHPR                 MEDF 100 MSG    P:T:nA   NONE   2201000b
0a01001c TST0                 MEDF 100 TIME   P:T:nA   NONE
0a01001d MST0                 MEDF 100 MSG    P:T:nA   NONE   2201000c
0a01001e SHLL                 MEDF 100 READY  P:T:nA   NONE

The driver itself is pretty simple, it bridges the HW code provided by Microchip below with the libbsd networking stack.

The driver uses rtems_interrupt_handler_install() to provide the interrupt that the MSS Platform requires.

It has two RTEMs tasks, called PFRW and PRTW, for handling RX and TX respectively.

The TX uses three separate RTEMs queues for storing pointers to available, pending and done “packets”. These queues are called PFTA, PFTP and PFTD. TX worker wakes up on events, send packets from the PFTP queue and cleans up events from the PFTD queue and places them back into the PFTA queue. PFTD queue is filled from the ISR TX callback function from the MSS Platform code.

Rx side has a similar scheme but uses two queues called PFRA and PFRD.

Using the “queue” command from shell I can that PFTA and PFRA queues look healthy (just not doing anything) when the network stack is stuck.

  ID       NAME   ATTRIBUTES   PEND   MAXPEND  MAXSIZE
------------------------------------------------------------------------------
22010001   LOGQ    DEFAULT        0     512      260
22010002   BMQS    DEFAULT      390     400        8
22010003   BMQM    DEFAULT       50      50        8
22010004   BMQL    DEFAULT        9      20        8
22010005   PFRA    DEFAULT      127     128        8
22010006   PFRP    DEFAULT        0     128        8
22010007   PFTA    DEFAULT      256     256        8
22010008   PFTP    DEFAULT        0     256        8
22010009   PFTD    DEFAULT        0     256        8
2201000a   SHR0    DEFAULT       10      10      360
2201000b   SHTX    DEFAULT        0      10      360
2201000c   MSR0    DEFAULT        0       4       64

I’m using libbsd release version 6.2. And RTEMS as below.

rtems all
RTEMS: 6.0.0 (f6933b9c6ff6780c1b3a56d80aa9577f181e5763) SMP:4 cores
CPU: RISCV (RISCV)
BSP: mpfs64imafdc
Tools: 13.3.0 20240521 (RTEMS 6, RSB 3814cb0e7f86cca2be403eac831f9bf571984659-modified, Newlib 1b3dcfd)
Options: SMP
SHLL [/] #

task all look normal as well

Uptime: 2m52.753999          Period: 0.713986
Tasks:   34  Load Average:   18.233%  Load:   17.112%  Idle:  382.889%
Mem:   88M free 136M used 944K stack

 ID         | NAME                | RPRI | CPRI   | TIME                | TOTAL   | CURRENT
------------+---------------------+---------------+---------------------+---------+--^^----
 0x09010001 | IDLE                |  2147483647 |  2147483647   | 2m52.719866         |  24.995 |  99.997
 0x09010002 | IDLE                |  2147483647 |  2147483647   | 2m52.548196         |  24.970 |  99.986
 0x09010003 | IDLE                |  2147483647 |  2147483647   | 2m51.622244         |  24.836 |  99.800
 0x09010004 | IDLE                |  2147483647 |  2147483647   | 2m22.614902         |  20.638 |  83.105
 0x0a010001 | UI1                 |    1 |    1   | 3.275615            |   0.474 |   0.000
 0x0a010003 | TIME                |   98 |   98   | 0.150780            |   0.021 |   0.085
 0x0a01001d | MST0                |  100 |  100   | 19.678955           |   2.847 |  12.709
 0x0a01001a | EST0                |  100 |  100   | 7.371226            |   1.066 |   3.423
 0x0a01001b | SHPR                |  100 |  100   | 0.068704            |   0.009 |   0.046
 0x0a01001e | SHLL                |  100 |  100   | 0.036017            |   0.005 |   0.252
 0x0a010002 | LOGT                |  110 |  110   | 0.029288            |   0.004 |   0.020
 0x0a010004 | IRQS                |   96 |   96   | 0.000039            |   0.000 |   0.000
 0x0a010005 | IRQS                |   96 |   96   | 0.000028            |   0.000 |   0.000
 0x0a010006 | IRQS                |   96 |   96   | 0.000027            |   0.000 |   0.000
 0x0a010007 | IRQS                |   96 |   96   | 0.000007            |   0.000 |   0.000
 0x0a01001c | TST0                |  100 |  100   | 0.010689            |   0.001 |   0.006
 0x0a010012 | _BSD                |  100 |  100   | 0.005662            |   0.000 |   0.006
 0x0a010014 | _BSD                |  100 |  100   | 0.005576            |   0.000 |   0.004
 0x0a010013 | _BSD                |  100 |  100   | 0.004944            |   0.000 |   0.003
 0x0a010015 | _BSD                |  100 |  100   | 0.004848            |   0.000 |   0.004
 0x0a010018 | CPlt                |  100 |  100   | 0.004082            |   0.000 |   0.547
 0x0a010008 | _BSD                |  100 |  100   | 0.000017            |   0.000 |   0.000
 0x0a010009 | _BSD                |  100 |  100   | 0.000010            |   0.000 |   0.000
 0x0a01000a | _BSD                |  100 |  100   | 0.000009            |   0.000 |   0.000
 0x0a01000b | _BSD                |  100 |  100   | 0.000011            |   0.000 |   0.000
 0x0a01000c | _BSD                |  100 |  100   | 0.000012            |   0.000 |   0.000
 0x0a01000d | _BSD                |  100 |  100   | 0.001330            |   0.000 |   0.000
 0x0a01000e | _BSD                |  100 |  100   | 0.000009            |   0.000 |   0.000
 0x0a01000f | _BSD                |  100 |  100   | 0.000120            |   0.000 |   0.000
 0x0a010010 | _BSD                |  100 |  100   | 0.000010            |   0.000 |   0.000
 0x0a010011 | _BSD                |  100 |  100   | 0.265832            |   0.038 |   0.000
 0x0a010016 | PFRW                |  200 |  200   | 0.074205            |   0.010 |   0.002
 0x0a010017 | PFTW                |  100 |  100   | 0.503029            |   0.072 |   0.000
 0x0a010019 | DHCP                |  2147483646 |  2147483646   | 0.007700            |   0.001 |   0.000

The more data I try to send (larger / and or more packets) over the TCP/IP connection the faster it breaks down.

One final piece of information is that I’ve tried same code on RTEM7 (from 6 months ago since latest main don’t work at all with riscv/mpfs64imafdc for some reason) and I got the exact same error.

Thanks for reading and as I said above, any help debugging this would be greatly appreciated!

Thanks for the post and welcome to RTEMS. It is great to see you working on this.

Have you seen FreeBSD’s Microchip PolarFire SoC support page? I suggest you reach out to them and mention you are using RTEMS. They may have a working ethernet driver or something you can use and help improve.

I also suggest you consider working with RTEMS 7 and LibBSD 7-freebsd-14 for the development phase of the work. A working 7-freebsd-14 driver can be back ported to 6-freebsd-14.

When debugging at a low level the printk call is useful. A simple printk in the interrupt will let you know you are receiving interrupts.

A lack of buffers can mean:

  1. Blocked pipeline which means queues are holding the buffers. If you are not getting any interrupts or a DMA descriptor chain is wrong the pipe line is lost
  2. A buffer leak where a path is not returning the buffers and they are lost