-
Notifications
You must be signed in to change notification settings - Fork 321
DAOS-17712 cart: race in test_multisend_server #16599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
When multisend servers start, rank=0 saves group config file, which lets clients know they can start sending RPCs. Since multisend test servers dont communicate with each other there is a small race where rank=0 might save group config file before all other test servers finished registering for RPCs. Further, previously even rank=0 was saving group config file before it itself was ready to accept RPCs. The workaround moves RPC registration before group_config_save and adds 5 second delay on rank=0 before group info is saved. Signed-off-by: Alexander A Oganezov <alexander.oganezov@hpe.com>
Ticket title is 'cart/multisend_one_node.py:CartMultisendOneNodeTest.test_cart_multisend - stack trace for test_multisend_client' |
- crtu_start_basic_server helper function now takes optional protocol which is registered before progress threads are started - multisend server modified to pass its rpc protocol. Signed-off-by: Alexander A Oganezov <alexander.oganezov@hpe.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ftest LGTM (just C code actually)
Test stage Build RPM on EL 8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16599/6/execution/node/340/log |
Test stage Build RPM on EL 9 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16599/6/execution/node/337/log |
Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-16599/6/execution/node/353/log |
When multisend servers start, rank=0 saves group config file, which lets clients know they can start sending RPCs.
Since multisend test servers dont communicate with each other there is a small race where rank=0 might save group config file before all other test servers finished registering for RPCs.
Further, previously even rank=0 was saving group config file before it itself was ready to accept RPCs.
The workaround moves RPC registration before group_config_save and adds 5 second delay on rank=0 before group info is saved.
Steps for the author:
After all prior steps are complete: