-
Notifications
You must be signed in to change notification settings - Fork 771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kusama Validators Litep2p - Monitoring and Feedback #7076
Comments
This is a concern for me: #7077, we should pay attention to this. |
And this is the impact, I assume it was because validators restarted to use litep2p, but we should keep an eye on this if it is keep repeating. |
Confirm with paranodes, he had some validators that were constantly restarting, so that was the reason for this finality delays. |
And what was the reason for constantly restarting? |
Paranode had a script that restarted on low connectivity, which is exactly what this #7077 will produce. Nevertheless, even after the script was stopped we are still seeing occasional lower spikes in finality because of |
Did a bit more investigation on this path and for this list of candidates, which are slow to be approved, they induce a finality lag of around ~16 blocks.
For this particular candidates around 20/30 random validators(different polkadot versions) are a no-show, those validators aren't no-shows on any other candidate before and after, so it is a one-off for this particular candidates. What this candidates have in common is that all of them(9 of 9), have been backed in a group that contains STKD.IO/01 https://apps.turboflakes.io/?chain=kusama#/validator/5FKStTNJCk5J3EuuYcvJpNn8CxbkzW1J7mst3aayWCT8XrXh which seem to be one of the nodes that enabled litep2p. So, my theory is that the presence of this node in the backing group might make others slow on availability-recovery which results in no-shows and finality lag, however I don't have a definitive proof where this happens. Next
|
Confirmed STKD.IO/01 runs litep2p, reboot to libp2p will happen soon |
STKD.IO/01 was restarted with the litep2p flag around 2025-01-08 04:02:20 (at the start of the log file). It ran and outputted errors for about 25-30 min and cleared up around ~2025-01-08 04:30:00. I restarted the service a couple times at the beginning. The flag was removed 2025-01-14 14:43:47. https://public-logs-stkd.s3.us-west-2.amazonaws.com/extracted-messages.txt If you need any more info or have any questions let me know. |
This PR rejects inbound requests from banned peers (reputation is below the banned threshold). This mirrors the request-response implementation from the libp2p side. I won't expect this to get triggered too often, but we'll monitor this metric. While at it, have registered a new inbound failure metric to have visibility into this. Discovered during the investigation of: #7076 (comment) cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
This PR rejects inbound requests from banned peers (reputation is below the banned threshold). This mirrors the request-response implementation from the libp2p side. I won't expect this to get triggered too often, but we'll monitor this metric. While at it, have registered a new inbound failure metric to have visibility into this. Discovered during the investigation of: #7076 (comment) cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> (cherry picked from commit ef064a3)
This PR rejects inbound requests from banned peers (reputation is below the banned threshold). This mirrors the request-response implementation from the libp2p side. I won't expect this to get triggered too often, but we'll monitor this metric. While at it, have registered a new inbound failure metric to have visibility into this. Discovered during the investigation of: #7076 (comment) cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> (cherry picked from commit ef064a3)
This PR rejects inbound requests from banned peers (reputation is below the banned threshold). This mirrors the request-response implementation from the libp2p side. I won't expect this to get triggered too often, but we'll monitor this metric. While at it, have registered a new inbound failure metric to have visibility into this. Discovered during the investigation of: #7076 (comment) cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> (cherry picked from commit ef064a3)
This is a placeholder issue for the community (kusama validators) to share their feedback, monitoring and logs.
We’re excited to announce the next step in improving the Kusama network with the introduction of litep2p—a more resource-efficient network backend. We need your help to make this transition successful!
Enable Litep2p Backend
We’re gradually rolling out litep2p across all validators. Here’s how you can help:
Rollout Plan
Monitoring & Feedback
Please keep an eye on your node after restarting and report any warnings or errors you encounter. In the first 15–30 minutes after the restart, you may see some temporary warnings, such as:
We'd like to pay special attention to at least the following metrics:
Tasks
The text was updated successfully, but these errors were encountered: