Kusama Validators Litep2p - Monitoring and Feedback #7076

lexnv · 2025-01-07T13:30:33Z

This is a placeholder issue for the community (kusama validators) to share their feedback, monitoring and logs.

We’re excited to announce the next step in improving the Kusama network with the introduction of litep2p—a more resource-efficient network backend. We need your help to make this transition successful!

Enable Litep2p Backend

We’re gradually rolling out litep2p across all validators. Here’s how you can help:

Ensure you're running the latest Polkadot release (version 2412 or newer).
Restart your node with the following flag

--network-backend litep2p

Rollout Plan

Phase 1: We need around 100 validators to start the transition.
Phase 2 (in a few days): Increase to 500 validators running litep2p.
Phase 3: Full rollout—inviting all validators to switch.

Monitoring & Feedback

Please keep an eye on your node after restarting and report any warnings or errors you encounter. In the first 15–30 minutes after the restart, you may see some temporary warnings, such as:

    Some network error occurred when fetching erasure chunk
    Low connectivity

We'd like to pay special attention to at least the following metrics:

Sync peers (substrate_sync_peers)
Block height (substrate_block_height)

Tasks

Give feedback

The text was updated successfully, but these errors were encountered:

alexggh · 2025-01-07T14:23:16Z

This is a concern for me: #7077, we should pay attention to this.

alexggh · 2025-01-08T09:54:30Z

This is a concern for me: #7077, we should pay attention to this.

And this is the impact, I assume it was because validators restarted to use litep2p, but we should keep an eye on this if it is keep repeating.

https://grafana.teleport.parity.io/goto/-FyJtnvNg?orgId=1

alexggh · 2025-01-09T09:22:20Z

This is a concern for me: #7077, we should pay attention to this.

And this is the impact, I assume it was because validators restarted to use litep2p, but we should keep an eye on this if it is keep repeating.

Confirm with paranodes, he had some validators that were constantly restarting, so that was the reason for this finality delays.

eskimor · 2025-01-10T12:58:33Z

And what was the reason for constantly restarting?

alexggh · 2025-01-10T13:33:12Z

And what was the reason for constantly restarting?

Paranode had a script that restarted on low connectivity, which is exactly what this #7077 will produce.

Nevertheless, even after the script was stopped we are still seeing occasional lower spikes in finality because of no-shows on around ~20 validators, I'm working with @lexnv to understand what might cause that, because it perfectly correlates with the enablement of validators with litep2p.

alexggh · 2025-01-14T10:59:18Z

And what was the reason for constantly restarting?

Paranode had a script that restarted on low connectivity, which is exactly what this #7077 will produce.

Nevertheless, even after the script was stopped we are still seeing occasional lower spikes in finality because of no-shows on around ~20 validators, I'm working with @lexnv to understand what might cause that, because it perfectly correlates with the enablement of validators with litep2p.

Did a bit more investigation on this path and for this list of candidates, which are slow to be approved, they induce a finality lag of around ~16 blocks.

0x8fe297cb881a48611829b911b9dfc4c176d5a540b5fd0ab4a2114b6b65e04d71
0x16a13885e900a4afc18d1689a0197602ac64176d70ccd8608f9a260de3b3a22e
0x869c13acf8857fc2df09bd3991cfda24f81b639373bdc68ba7a10436940c4a4d
0x83a323ec47de77df87e9e2dfc49650169622fd641304e62f8f83db085fc1822c
0x7881ad19ad2cd502c7af16313bf569c9c5ac5d42cd47b9b2bede8df71f63cd56
0xb062a7c221fcd1ecfe33f1aabc1b48abf3949a9ab09713e8ca6a86800388103c
0xd233f3836a3a4bc7703c7b211d8da49bd7455e7b2d264b6cfd053f42e73948f7
0x8f811f8b2d246badb748a46156fc0a992316616e36f8779f32a0606853f68df8
0x477c4030533603182240063e80b1169fac6674bdd1b207c719d5363b231a28ba
0x756973880ea6a9a08d3ab881ef7e908a4779c5c2f1d4e4aa3f01f2c2510171ec

For this particular candidates around 20/30 random validators(different polkadot versions) are a no-show, those validators aren't no-shows on any other candidate before and after, so it is a one-off for this particular candidates.

What this candidates have in common is that all of them(9 of 9), have been backed in a group that contains STKD.IO/01 https://apps.turboflakes.io/?chain=kusama#/validator/5FKStTNJCk5J3EuuYcvJpNn8CxbkzW1J7mst3aayWCT8XrXh which seem to be one of the nodes that enabled litep2p.

So, my theory is that the presence of this node in the backing group might make others slow on availability-recovery which results in no-shows and finality lag, however I don't have a definitive proof where this happens.

This PR rejects inbound requests from banned peers (reputation is below the banned threshold). This mirrors the request-response implementation from the libp2p side. I won't expect this to get triggered too often, but we'll monitor this metric. While at it, have registered a new inbound failure metric to have visibility into this. Discovered during the investigation of: #7076 (comment) cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>

This PR rejects inbound requests from banned peers (reputation is below the banned threshold). This mirrors the request-response implementation from the libp2p side. I won't expect this to get triggered too often, but we'll monitor this metric. While at it, have registered a new inbound failure metric to have visibility into this. Discovered during the investigation of: #7076 (comment) cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> (cherry picked from commit ef064a3)

lexnv added this to Networking Jan 7, 2025

This was referenced Jan 13, 2025

litep2p authority discovery is very slow 5m vs 1h #7077

Open

metrics: Expose litep2p metrics in an agnostic manner paritytech/litep2p#294

Open

lexnv mentioned this issue Jan 14, 2025

req-resp/litep2p: Reject inbound requests from banned peers #7158

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kusama Validators Litep2p - Monitoring and Feedback #7076

Kusama Validators Litep2p - Monitoring and Feedback #7076

lexnv commented Jan 7, 2025 •

edited by alexggh

Loading

Tasks

alexggh commented Jan 7, 2025

alexggh commented Jan 8, 2025

alexggh commented Jan 9, 2025

eskimor commented Jan 10, 2025

alexggh commented Jan 10, 2025

alexggh commented Jan 14, 2025 •

edited

Loading

lexnv commented Jan 14, 2025

Sudo-Whodo commented Jan 15, 2025

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

Kusama Validators Litep2p - Monitoring and Feedback #7076

Kusama Validators Litep2p - Monitoring and Feedback #7076

Comments

lexnv commented Jan 7, 2025 • edited by alexggh Loading

Enable Litep2p Backend

Rollout Plan

Monitoring & Feedback

Tasks

alexggh commented Jan 7, 2025

alexggh commented Jan 8, 2025

alexggh commented Jan 9, 2025

eskimor commented Jan 10, 2025

alexggh commented Jan 10, 2025

alexggh commented Jan 14, 2025 • edited Loading

Next

lexnv commented Jan 14, 2025

Sudo-Whodo commented Jan 15, 2025

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

lexnv commented Jan 7, 2025 •

edited by alexggh

Loading

alexggh commented Jan 14, 2025 •

edited

Loading