I’m interested in the Message Queue Improvements and Fixes GSoC project and have started looking into one of the linked issues:-
CORE_message_queue_Send timeout failure (#3986)
I studied the relevant code files and spent some time going through the Classic, POSIX, and SCORE message queue send paths to understand the current behavior, and I wanted to share my understanding so far and get feedback from the community.
observations
POSIX APIs
mq_send() goes through _POSIX_Message_queue_Send_support(), which ultimately calls _CORE_message_queue_Submit().
Blocking depends on O_NONBLOCK:
Not set = >wait = true
Set = > wait = false
mq_timedsend() also uses the same support path but passes timeout info via thread queue context + watchdog handling.
So from what I see, POSIX queues do use both the blocking and timeout paths in the CORE layer.
Classic APIs
rtems_message_queue_send() calls _CORE_message_queue_Send() with wait = false Always.
That calls _CORE_message_queue_Submit(), but only the non-blocking path is used.
So Classic MQ send never seems to use the wait == true path.
core message queue observations
Looking at _CORE_message_queue_Submit(), it already supports:
Blocking send
Sender wait queues
Timeout handling via thread queue + watchdog
But this capability appears to be used only by POSIX MQs, not Classic MQs since Classic Mqs are always set to wait=false.
doubts
I had a few questions:
Was the always non-blocking design of rtems_message_queue_send() an intentional Classic API choice for determinism and simplicity?
Or is there any scope of interest in extending Classic MQs to support blocking or timed send when the queue is full?
Classic APIs go through _CORE_message_queue_Send() instead of calling _CORE_message_queue_Submit() directly like POSIX.
Is this layering mainly for API abstraction?
I just want to make sure tat I’m interpreting the issue correctly and based on this my next move would be to test the existing Classic API send path with wait=true by writing a relevant test case which likely gives an error since that blocking path is not utilised and the possible solution would be to extend the Classic API to support the timedout blocking path i.e rtems_message_timed_send().
The Classic API message queue supports non-blocking, blocking forever, or blocking with timeout. The API pattern used it that the options argument must include RTEMS_NO_WAIT for polling. RTEMS_WAIT is the default and when in place, the timeout field is the number of ticks. Other Classic API objects like semaphores follow the same API pattern.
POSIX message queues support the sender being able to block when the queue is full and a message arrival notification. These are not in the Classic API message queues and disabled when POSIX is not enabled.
One area that has always bothered me with our POSIX messager queue implementation is that per POSIX the messages can be sorted on message priority. The RTEMS implementation of this is a sorted linked list with O(n) insertion time. Luckily we do have the bound of MQ_PRIO_MAX set to 32 in limits.h. There may not be any error checking on the range and it could be managed as a Red/Black Tree to give an algorithmic bounds. See mq_send and limits.h in the POSIX standard.
Another message queue issue is that sending/receiving a message of zero length needs a behaviour double check against the POSIX and RTEID/ORKID standards along with implementations on other operating systems. My quick search of those standards turned up nothing and I think it is implied to work.
There is also mq_open lacks support of “mode” argument. That requires some thinking to have a solution that makes sense with the RTEMS implementation. There is also a similar issue with sem_open(). The same logical solution applies to both. Both those issues are old. No one has had a good idea on how to solve them but no one wants to close them either. Fixing or providing a convincing argument to close them is welcome.
I think there may be something found by Coverity Scan but it is down right now. Even if there is, it will be a specific coding concern.
@JoelSherrill@gedare Thank you for your clear and detailed answer, it helped me understand the problem and think for a solution properly.
1st issue:- Classic send api blocking and timeout coverage gap :-
So the possible solution is coverage gap of classic send api , and my approach towards the solution is to create rtems_message_queue_timed_send( wait=true, timeout) and keeping the rtems_message_queue_send(wait=false) as it is for code modularity and differentiation.
I will run the program with the relevant test files and make sure there is no error.
So when posix is not used , the classic send will support all non-blocking, blocking forever, blocking with timeout.
Shall i proceed with the next step to code the solution?
Other issues:-
Optimization of message priority in the posix message queue
Zero length messages test
Mode support for mq_open() which is the same logic for sem_open() problem
Infinite loop due to broadcasting to a higher priority message
Posix message queue thread release order is not in priority.
Next i’ll research about these linked issues and work on them.
please correct me if i am wrong with my approach and understandings.
@gedare
So if I understand correctly, in the Classic API, waiting semantics are intentionally supported only on the receive side, and send operations are designed to be non-blocking when the queue is full.
In that case, the wait+timeout path in _CORE_message_queue_Submit() is exercised primarily via the POSIX MQ APIs, and not through Classic MQ sends.
Should the scope of this issue therefore focus on validating/fixing the CORE blocking send + timeout behavior as used by POSIX, rather than extending the Classic API to support blocking/timed send?
By validating I mean running the relevant test cases for checking proper working of non-blocking , timed blocking cases of CORE send APIs.
And if I find any edge case coverage gaps while testing (for example around timeout handling, queue full conditions, or sender wait paths), I would add targeted test cases and use them to check for bugs in the CORE blocking-send + timeout path.
I don’t know if there’s anything wrong with the CORE blocking send + timeout behavior as used by POSIX. For this particular issue, I think you could proceed with the proposal to provide rtems_message_queue_timed_send() as part of the work to be done. This will not be very much code, so you will want to continue scoping out the remainder of the message queue related tasks.
I will include the implementation and testing of rtems_message_queue_timed_send() as one of the issue of proposal.
Since the this issue is small, I have also spent last few days studying the “Lower Message Priority Range” optimization issue. I have traced the current O(n) chain priority insertion bottleneck and also ran the relevant test cases to check the edge cases.
I have posted a detailed post with my code-tracing findings,screenshots and reference links on that specific issue’s post.
With these two I’ll also include a few of the other linked issues under “Message Queue Improvement and Fixes” issue.
Hello @gedare@JoelSherrill,
So as mentioned above, I considered another one message queue issue along with this i.e
#5247.
I had a doubt regarding the proposal’s scope of this project, Since
issue Lower Message Priority Range Supported by POSIX Message Queues as discussed in thread 1, the proposed solution is adding priority bucket + bitmap lookup solution with the fallback of existing ordered chain insertion when the range exceeds, so it consists of algorithm implementation, test cases to check the correct working of new algorithm and also verifying the fallback if the range exceeds, and documentation.
issue CORE_message_queue_Send timeout failure as discussed above, requires implementation of new api rtems_message_queue_timed_send(), and seperate testfile with test cases similar to POSIX timed send api, and updating user documentation
will this be suitable for medium(~175 hrs) or large(~300 hrs) ?
Hello , i have created a draft proposal and added to the tracking page, will be updating the proposal based on the suggestions and guidance given by mentors.