Epsr Feature Overview Guide
Epsr Feature Overview Guide
Introduction
This guide describes EPSR and how to configure it. List of Terms:
EPSR
Putting a ring of Ethernet switches at the core of a network is Ethernet Protection Switched
a simple way to increase the network’s resilience—such a Ring.
network is no longer susceptible to a single point of failure. ER
However, the ring must be protected from Layer 2 loops. Enhanced Recovery.
Traditionally, STP-based technologies are used to protect EPSR Domain
rings, but they are relatively slow to recover from link failure. An EPSR Domain is created
This can create problems for applications that have strict loss from individual switch nodes
connected as a ring, where all
requirements, such as voice and video traffic, where the
nodes are configured with an
speed of recovery is highly significant.
EPSR instance with the same
set of EPSR Data VLANs.
This guide describes a fast alternative to STP: Ethernet Failover timer expiry
Protection Switched Ring (EPSR). EPSR enables rings to
The Failover timer expires when
recover rapidly from link or node failures—within as little as several healthcheck messages
50ms, depending on port type and configuration. This is fail to circumnavigate the ring,
much faster than STP at 30 seconds or even RSTP at 1 to 3 due to a break in the ring. This
seconds. causes the master node to
undertake subsequent fault
recovery actions.
In a separate section, this guide also describes the EPSR
Health messages
SuperLoop Prevention (EPSR-SLP) feature, which is an
enhancement to the existing EPSR feature in AlliedWare Plus. Messages sent to check the
condition of the ring. Also
EPSR-SLP prevents “SuperLoops” forming in certain EPSR
known as Healthcheck
multi-ring topologies. This functionality makes it possible for messages.
EPSR-SLP protected rings to have data VLANs in common
on their respective ring domains.
Contents
Introduction ........................................................................................................................................1
Products and software version that apply to this guide ...............................................................4
What information will you find in this document?.........................................................................4
Allied Telesis products that support EPSR and their ring limits....................................................4
Topology Changes and Optimizing EPSR rings with Large Numbers of Data VLANs ......................37
Counters............................................................................................................................................41
Master node in a complete ring ..................................................................................................41
Transit node in a ring that had failures........................................................................................41
Debugging .........................................................................................................................................42
Link down between master node and transit node ....................................................................42
Master node (Node A) debug output...................................................................................42
Transit node (Node B) debug output ...................................................................................49
Link Down between two transit nodes .......................................................................................53
Master node (Node A) debug output...................................................................................53
Transit node (Node B) debug output ...................................................................................59
However, support and implementation of EPSR varies between products. To see whether a product
supports a particular feature or command, see the following documents:
These documents are available from the above links on our website at alliedtelesis.com.
Most features described in this document are supported from AlliedWare Plus 5.4.4 or later. These
features are available in later releases:
Version 5.4.7-1.1 and later support combining EPSR with G.8032 sub-rings.
Feature support may change in later software versions. For the latest information, see the above
documents.
Allied Telesis products that support EPSR and their ring limits
The following table shows whether each product can be configured as a master node, and the
maximum number of rings (domains) supported by each product. Note that if your ring has a mixture
of products that support 16 rings and products that support 64 rings, the maximum number of rings
is 16.
x950 Series Y 64
x930 Series Y 64
x550 Series Y 64
x320 Series Y 64
x220 Series - 64
IE510 Series Y 16
IE300 Series Y 16
IE210L Series Y 16
XS900MX Series Y 16
GS980MX Series - 64
GS980EM Series - 64
GS980M Series - 64
GS970M Series - 16
FS980M Series - 16
SwitchBlade x908 Y 64
DC2552XS/L3 Y 16
x900 Series Y 64
x610 Series Y 64
x510 Series Y 16
IX5 Y 16
x310 Series Y 16
x210 Series - 16
IE200 Series Y 16
MiniMAP 9100 Y 64
iMAP 9700 Y 64
MicroMap 9001 Y 64
The following diagram shows a basic ring with all the switches in the ring up:
End User
Ports
Control VLAN is forwarding Control VLAN is forwarding
Data VLAN is blocked Data VLAN is forwarding
Ma
S
st
er
P
Tr ode
an 4
N
Tr ode
sit
an 1
N
sit
End User
Ports
End User
Ports
Tr ode
Tr ode
an 2
N
an 3
N
sit
sit
Conrol VLAN
Data VLAN 1
Data VLAN 2
P Primary Port
S Secondary Port
Establishing a ring
Once you have configured EPSR on the switches, the following steps complete the EPSR ring:
1. The master node creates an EPSR Health message and sends it out the primary port. This
increments the master node’s Transmit: Health counter in the show epsr count command.
2. The first transit node receives the Health message on one of its two ring ports and, using a
hardware filter, sends the message out its other ring port.
Note: Transit nodes never generate Health messages, only receive them and forward them with their
switching hardware. This does not increment the transit node’s Transmit: Health counter.
However, it does increment the Transmit counter in the show switch port command.
The hardware filter also copies the Health message to the CPU. This increments the transit node’s
Receive: Health counter. The CPU processes this message as required by the state machines,
but does not send the message anywhere because the switching hardware has already done this.
3. The Health message continues around the rest of the transit nodes, being copied to the CPU and
forwarded in the switching hardware.
4. The master node eventually receives the Health message on its secondary port. The master
node's hardware filter copies the packet to the CPU (which increments the master node’s
Receive: Health counter). Because the Master received the Health message on its secondary
port, it knows that all links and nodes in the ring are up.
When the master node receives the Health message back on its secondary port, it resets the
Failover timer. If the Failover timer expires before the master node receives the Health message
back, it concludes that the ring must be broken.
The master node does not send that particular Health message out again. If it did, the packet
would be continuously flooded around the ring. Instead, the master node generates a new Health
message when the Hello timer expires.
Detecting a fault
EPSR uses a fault detection scheme that alerts the ring when a
break occurs, instead of using a spanning tree-like calculation to Master Node States:
determine the best path. The ring then automatically heals itself Complete
by sending traffic over a protected reverse path. The state when there are no
link or node failures on the ring.
EPSR uses the following two methods to detect when a transit Failed
node or a link goes down: The state when there is a link
or node failure on the ring. This
Master node polling fault detection
state indicates that the master
To check the condition of the ring, the master node regularly node received a Link-Down
sends Health messages out its primary port, as described in message or that the failover
"Establishing a ring" on page 7. If all links and nodes in the timer expired before the
ring are up, the messages arrive back at the master node on master node’s secondary port
its secondary port. received a Health message.
Transit node states:
This can be a relatively slow detection method, because it
Idle
depends on how often the node sends Health messages. The
master node only ever sends Health messages out its The state when EPSR is first
primary port. If its primary port goes down, it does not send configured, before the master
Health messages. node determines that all links
in the ring are up. In this state,
Transit node unsolicited fault detection both ports on the node are
blocked for the data VLAN.
To speed up fault detection, EPSR transit nodes directly From this state, the node can
communicate when one of their interfaces goes down. When move to LinksUp or
a transit node detects a fault at one of its interfaces, it LinksDown.
immediately sends a Link- Down message over the link that
LinksUp
remains up. This notifies the master node that the ring is
broken and causes it to respond immediately. The state when both the
node’s ring ports are up and
forwarding. From this state,
Recovering from a fault the node can move to
LinksDown.
Fault in a link or a transit node LinksDown
When the master node detects an outage somewhere in the ring, The state when one or both of
using either detection method, it restores traffic flow by: the node’s ring ports are
down. From this state, the
1. Declaring the ring to be in a Failed state node can move to Pre-
forwarding.
2. Unblocking its secondary port, which enables data VLAN
Pre-forwarding
traffic to pass between its primary and secondary ports
The state when both ring ports
3. Flushing its own forwarding database (FDB) for the two ring are up, but one has only just
ports
4. Sending an EPSR Ring-Down-Flush-FDB control message to all the transit nodes, via both its
primary and secondary ports.
The transit nodes respond to the Ring-Down-Flush-FDB message by flushing their forwarding
databases for each of their ring ports. As the data starts to flow in the ring’s new configuration,
the nodes (Master and Transit) re-learn their Layer 2 addresses. During this period, the master
node continues to send Health messages over the control VLAN. This situation continues until
the faulty link or node is repaired.
For a multi-domain ring, this process occurs separately for each domain within the ring.
They stop receiving Health messages and other messages from the master node.
The transit nodes connected to the master node experience a broken link, so they send Link-
Down messages. If the master node is down these messages are simply dropped.
Neither of these symptoms affect how the transit nodes forward traffic.
Once the master node recovers, it continues its function as the master node.
Enhanced Recovery
A transit node port enters the Pre-forwarding state when the ring port becomes electrically available.
Enhanced Recovery can speed a node’s recovery from the Pre-forwarding state.
With Enhanced Recovery, the transit node port can exit the Pre-forwarding state without the entire
ring becoming complete. It does this in one of two ways:
When entering the Pre-forwarding state, the transit node sends a Link-Forward-Request
message and waits for a response from the master node. When the Master receives this
message, it sends a special healthcheck message. If the Master does not receive the healthcheck
back within x seconds, the Master sends a Permission-Link-Forward message to the transit node.
The transit node can then start forwarding on both ports.
Without Enhanced Recovery, the transit node port waits in the Pre-forwarding state until it receives
the Ring Up Flush message from the Master. This occurs when the Master receives back its
healthcheck messages, and the ring is declared complete.
Note: Version 5.4.6-1.x extends EPSR SuperLoop Protection (SLP) to allow multiple ring EPSR
scenarios where there are multiple ring masters on a common segment, as long as none of
the master secondary ports are on the common segment. However, in such scenarios, it is
not advisable to use EPSR Enhanced Recovery on transit nodes.
2. Blocking its secondary port for data VLAN traffic (but not for the control VLAN)
4. Sending a Ring-Up-Flush-FDB message from its primary port, to all transit nodes.
Change the state of their ports from blocking to forwarding for the data VLAN, which allows data
to flow through their previously-blocked ring ports
The transit nodes do not start forwarding traffic on the previously-down ports until after they receive
the Ring-Up-Flush-FDB message. This makes sure the previously-down transit node ports stay
blocked until after the master node blocks its secondary port. Otherwise, the ring could form a loop
because it had no blocked ports.
1. Puts the port immediately into the forwarding state and starts forwarding data out that port. It
does not need to wait, because the node knows there is no loop in the ring—because the other
ring port on the node is down.
4. Waits for the timer to expire. At that time, if one port is still up and one is still down, the transit
node sends a Ring-Up-Flush-FDB message out the port that is up. This message is usually called
a “Fake Ring Up message”. Sending this message allows any ports on other transit nodes that
are blocking or in the Pre-forwarding state to move to forwarding traffic in the LinksUp state. The
timer delay lets the device at the other end of the link that came up configure its port
appropriately, so that it is ready to receive the transmitted message.
Note: The master node would not send a Ring-Up-Flush-FDB message in these circumstances,
because the ring is not in a state of Complete. The master node’s secondary port remains
unblocked.
From software version 5.4.7-1.1 onwards, G.8032 can also interact with EPSR. A G.8032 sub-ring
may be connected to and interact with an EPSR ring.
Note: NOTE: Only a G.8032 sub-ring connected to an EPSR ring is supported. A G.8032 major ring
connected to an EPSR ring is not supported.
In the following diagram, a G.8032 sub-ring that is made up of nodes 3, 6, 7, 8, and 5 is connected
to an EPSR ring made up of nodes 1, 2, 3, 4, and 5. The G.8032 sub-ring is protecting the same
Data VLANs as the EPSR ring. In this scenario, any topology changes seen in the G.8032 sub-ring
may need to be propagated to the EPSR ring. This requires EPSR on the Interconnecting node
(nodes 3 and 5) to inform all the other nodes in the EPSR instance to flush their FDB using the
FLUSH-FDB message.
Node 2
EPSR EPSR
Node 3 Node 3
A A Node 5
Node 5
Node 4 Node 4
RPL RPL
G.8032 G.8032
Sub-Ring Sub-Ring
Node 6 Node 6
Node 8 Node 8
X
Node 7 Node 7
B B
The path from A to B for the data VLANs that are being protected in the EPSR ring and the G.8032
sub-ring is shown in the red dotted line, going through nodes 4-3-6-7. When a failure occurs in the
G.8032 sub-ring as shown, a block is performed around the failure, while the G.8032 sub-ring's RPL
is opened up for traffic to flow. The new active topology of the same data VLANs has changed. The
path from A to B is different as shown by the green dotted line going through nodes 4-5-8-7.
However, node 4 (as well as nodes 1 and 2) in the EPSR ring do not know about the topology
change in the G.8032 sub-ring and node 4 continues to forward traffic from A to B along the red
dotted line. To overcome this, the nodes in the EPSR ring need to flush their FDB. Within the
interconnected node (nodes 3 and 5), this requires the sub-ring ERP instance to notify the EPSR ring
domain instance so that EPSR in nodes 3 and 5 can perform a flush and in turn notify all the other
nodes in the EPSR ring to also perform an FDB flush.
Although the G.8032 ERP instance and EPSR domain instance are independent of one another, the
interconnected node has knowledge of which instances are protecting the same data VLANs. As
such, the sub-ring ERP instance can determine which EPSR ring domain instance to notify of
topology changes seen by the G.8032 sub-ring.
Once the EPSR ring domain instance is notified of the TCN by the G.8032 sub-ring ERP instance,
the EPSR ring domain instance can send out a FLUSH-FDB message. This message will go around
the EPSR ring causing all the other EPSR nodes in the EPSR ring to perform a FDB flush. After
which, MAC re-learning will occur and traffic along the protected data VLANs can flow properly
along the green path. Because of added delays of the G.8032 ring informing the EPSR ring of the
topology change, switchover times of the data VLANs may not meet 50ms objectives.
Note: Not all scenarios require the non-interconnected EPSR ring nodes to perform an FDB flush,
and as such the sending of the FLUSH-FDB message is optional.
A G.8032 ERP instance detecting a topology change sends TCNs to the EPSR instance(s) that is
protecting the same data VLAN(s) as the ERP instance if the EPSR instance has two ring ports and
is enabled. This is called the “target EPSR instance”. Once a TCN is received by the target EPSR
instance, the target EPSR instance performs an FDB Flush of its two ring ports. The target EPSR
instance also performs an ARP cache flush, and sends Query Solicit, and gratARP if the VLAN is
enabled for such.
To enable an EPSR instance to send out a FLUSH-FDB message after being notified by an ERP
instance, use the following command:
where:
g8032 is the protocol that EPSR will allow as the trigger for the sending of an FLUSH-FDB message.
By default, the topology-change is disabled for the EPSR instance, and thus the target EPSR
instance will not send a FLUSH-FDB message.
To see which EPSR target instances were found, use the following command:
An example of the output can be seen in the ERPS instance section. The relevant section:
Once a target EPSR instance is informed of a TCN, if enabled by configuration, it will in turn send
FLUSH-FDB messages out both of its ring ports. Upon receipt of a FLUSH-FDB message (not from
itself), the EPSR instance performs an FDB flush on both of its ring ports. The EPSR instance also
performs an ARP cache flush, and send Query Solicit, and gratARP if the VLAN is enabled for such.
To see the EPSR configuration for topology change, use the following command:
To see the EPSR counts for sending and receiving FLUSH-FDB messages, use the following
command:
Configuring EPSR
EPSR does not in itself limit the number of nodes that can exist on any given ring. For information on
ring limits, see the section titled: "Allied Telesis products that support EPSR and their ring limits" on
page 4.
If you already have a ring in a live network, disconnect the cable between any two of the nodes
before you start configuring EPSR, to prevent a loop.
On each switch, perform the following configuration steps. Configuration of the master node and
each transit node is very similar.
This step creates the control and data VLANs for EPSR. Enter global configuration mode and
enter the following commands:
awplus(config)#vlan database
awplus(config-vlan)#vlan <control-vid> name <control-vlan-name>
awplus(config-vlan)#vlan <data-vid> name <data-vlan-name>
This step sets the rings ports to VLAN trunk mode and adds the control and data VLANs.
The final command removes the native VLAN (vlan1) from the ring ports. If you leave all the ring
ports in the native VLAN, they will create a loop, unless vlan1 is part of the EPSR domain. To avoid
loops, you need to do one of the following:
remove at least one of the ring ports from vlan1 on at least one of the switches. We do not
recommend this option, because the action you have taken is less obvious when maintaining
the network later.
In this document, we remove the ring ports from the native VLAN (vlan1).
This step creates the domain, specifying whether the switch is the master node or a transit node.
It also specifies which VLAN is the control VLAN, and on the master node which port is the
primary port.
You can change how often the master sends health messages (hellotime), how long the master
waits for a returning health check message before entering the failed state (failovertime), and the
minimum time a master must stay in the failed state (ringflaptime). The failovertime should be
more than double the hellotime.
For most networks, the default timer settings are suitable. But if you are using VCStack with
EPSR, you must increase the EPSR failovertime to at least 5 seconds, to avoid broadcast storms
during stack failover. Broadcast storms may occur if the stack failover isn’t finished before the
EPSR failover timer expires.
5. Enable EPSR.
On each switch, configure the other ports and protocols that are required for your network.
End User
A Ports
port1.0.2:secondary
Ma
port1.0.1:primary
P
ste
S
r
port1.0.1 port1.0.1
Tr Nod
B
an e
Tr Nod
sit
an e
sit
port1.0.2 port1.0.2
End User
C
End User
Ports Ports
Conrol VLAN
Data VLANs
P Primary Port
S Secondary Port
awplus(config)#vlan database
awplus(config-vlan)#vlan 1000 name epsr-control
awplus(config-vlan)#vlan 2 name data
awplus(config-vlan)#interface port1.0.1-port1.0.2
awplus(config-if)#switchport mode trunk
awplus(config-if)#switchport trunk allowed vlan add 1000,2
awplus(config-if)#switchport trunk native vlan none
Create the domain, specifying that this switch is the master node. Also specify which VLAN is the
control VLAN and which port is the primary port. In this example the EPSR domain is called awplus.
awplus(config-if)#epsr configuration
awplus(config-epsr)#epsr awplus mode master controlvlan 1000 primaryport
port1.0.1
awplus(config-epsr)#epsr awplus datavlan 2
awplus(config)#vlan database
awplus(config-vlan)#vlan 1000 name epsr-control
awplus(config-vlan)#vlan 2 name data
awplus(config-vlan)#interface port1.0.1-port1.0.2
awplus(config-if)#switchport mode trunk
awplus(config-if)#switchport trunk allowed vlan add 1000,2
awplus(config-if)#switchport trunk native vlan none
awplus(config-if)#epsr configuration
awplus(config-epsr)#epsr awplus mode transit controlvlan 1000
awplus(config-epsr)#epsr awplus datavlan 2
The two ring ports must belong to both the control VLAN and all data VLANs.
port1.0.2:
B secondary
port1.0.1 S A
Ma Nod
ste e
Tr Nod
an e
r
sit
P port1.0.1:
primary
Domain 1
port1.0.2 control VLAN: 1000
data VLAN: 2
port1.0.2 port1.0.1
Tr Nod
an e
sit
E port1.0.4
port1.0.5 port1.0.5
Tr Nod
an e
sit
D Domain 2
control VLAN: 40
port1.0.4
data VLAN: 50
port1.0.4:
Ma Nod
P
ste e
primary
r
S
C
port1.0.5:
secondary
Conrol VLAN
Data VLANs
P Primary Port
S Secondary Port
The master node for domain 1 is the same as in the previous example (except that the domain has
been renamed).
awplus(config)#vlan database
awplus(config-vlan)#vlan 1000 name epsr-control
awplus(config-vlan)#vlan 2 name data
awplus(config-vlan)#interface port1.0.1-port1.0.2
awplus(config-if)#switchport mode trunk
awplus(config-if)#switchport trunk allowed vlan add 1000,2
awplus(config-if)#switchport trunk native vlan none
awplus(config-if)#epsr configuration
awplus(config-epsr)#epsr domain1 mode master controlvlan 1000 primaryport
port1.0.1
awplus(config-epsr)#epsr domain1 datavlan 2
Enable EPSR:
Step 2. Configure the transit node (switch B) that belongs just to domain 1.
This transit node is the same as in the previous example (except that the domain has been
renamed).
awplus(config)#vlan database
awplus(config-vlan)#vlan 1000 name epsr-control
awplus(config-vlan)#vlan 2 name data
awplus(config-vlan)#interface port1.0.1-port1.0.2
awplus(config-if)#epsr configuration
awplus(config-epsr)#epsr domain1 mode transit controlvlan 1000
awplus(config-epsr)#epsr domain1 datavlan 2
Enable EPSR:
awplus(config-epsr)#vlan database
awplus(config-vlan)#vlan 40 name epsr-control
awplus(config-vlan)#vlan 50 name data
awplus(config-vlan)#interface port1.0.4-port1.0.5
awplus(config-if)#switchport mode trunk
awplus(config-if)#switchport trunk allowed vlan add 50,40
awplus(config-if)#switchport trunk native vlan none
awplus(config-if)#epsr configuration
awplus(config-epsr)#epsr domain2 mode master controlvlan 40 primaryport
port1.0.4
awplus(config-epsr)#epsr domain2 datavlan 50
awplus(config-epsr)#epsr domain2 state enabled
awplus(config-epsr)#end
Step 4. Configure the transit node (switch D) that belongs just to domain 2.
awplus(config)#vlan database
awplus(config-vlan)#vlan 40 name epsr-control
awplus(config-vlan)#vlan 50 name data
awplus(config-vlan)#interface port1.0.4-port1.0.5
awplus(config-if)#switchport mode trunk
awplus(config-if)#switchport trunk allowed vlan add 40,50
Step 5. Configure the transit node (switch E) that belongs to both domains.
Two separate EPSR domains are configured on this device. Enter global configuration mode and
enter the following commands:
awplus(config)#vlan database
awplus(config-vlan)#vlan 1000 name epsr-control
awplus(config-vlan)#vlan 2 name data
awplus(config-vlan)#interface port1.0.1-port1.0.2
awplus(config-if)#switchport mode trunk
awplus(config-if)#switchport trunk allowed vlan add 1000,2
awplus(config-if)#switchport trunk native vlan none
awplus(config-if)#interface port1.0.4-port1.0.5
awplus(config-if)#switchport mode trunk
awplus(config-if)#switchport trunk allowed vlan add 40,50
awplus(config-if)#switchport trunk native vlan none
awplus(config-if)#epsr configuration
awplus(config-epsr)#epsr domain1 mode transit controlvlan 1000
awplus(config-epsr)#epsr domain1 datavlan 2
port1.0.2:
B port1.0.1
secondary
A
Ma Nod
S
ste e
Tr Nod
an e
r
sit
P port1.0.1:
Domain 1 primary
port1.0.2 control VLAN: 1000
data VLAN: 2
port1.0.2 port1.0.1
Sw
itc
h
E
port1.0.11
port1.0.11 port1.0.10
RS itch
Sw
TP
D RSTP:
STP VLAN: 10
port1.0.10
RS itch
port1.0.10
Sw
TP
C
port1.0.11
Conrol VLAN
Data VLANs
P Primary Port
S Secondary Port
Step 1. Configure the master node (switch A) for the EPSR domain.
awplus(config)#vlan database
awplus(config-vlan)#vlan 1000 name epsr-control
awplus(config-vlan)#vlan 2 name data
awplus(config-vlan)#interface port1.0.1-port1.0.2
Step 2. Configure the transit node switch (B) that belongs just to the EPSR
domain.
awplus(config)#vlan database
awplus(config-vlan)#vlan 1000 name epsr-control
awplus(config-vlan)#vlan 2 name data
awplus(config-vlan)#interface port1.0.1-port1.0.2
awplus(config-if)#switchport mode trunk
awplus(config-if)#switchport trunk allowed vlan add 1000,2
awplus(config-if)#switchport trunk native vlan none
awplus(config-if)#epsr configuration
awplus(config-epsr)#epsr domain1 mode transit controlvlan 1000
awplus(config-epsr)#epsr domain1 datavlan 2
awplus(config-epsr)#epsr domain1 state enabled
awplus(config-epsr)#end
Step 3. Configure switches belonging to the RSTP instance (switches C and D).
awplus(config)#vlan database
awplus(config-vlan)#vlan 10 name rstp-domain
awplus(config-vlan)#interface port1.0.10-1.0.11
awplus(config-if)#switchport mode trunk
awplus(config-if)#switchport trunk allowed vlan add 10
awplus(config-if)#end
Configure the data and control VLANs for the EPSR domain:
awplus(config)#vlan database
awplus(config-vlan)#vlan 1000 name epsr-control
awplus(config-vlan)#vlan 2 name data
awplus(config-vlan)#interface port1.0.1-port1.0.2
awplus(config-if)#switchport mode trunk
awplus(config-if)#switchport trunk allowed vlan add 1000,2
awplus(config-if)#switchport trunk native vlan none
awplus(config-if)#interface port1.0.10-port1.0.11
awplus(config-if)#switchport mode trunk
awplus(config-if)#switchport trunk allowed vlan add 10
awplus(config-if)#epsr configuration
awplus(config-epsr)#epsr domain1 mode transit controlvlan 1000
awplus(config-epsr)#epsr domain1 datavlan 2
Enable EPSR:
Note: Although RSTP is enabled by default on AlliedWare Plus switches, it is automatically disabled
on EPSR ports.
Traffic for vlan20 is nested inside vlan 50 for transmission around the core
Traffic for vlan200 is nested inside vlan51 for transmission around the core
The data VLANs for the EPSR domain are vlan50 and vlan51
Cl witc
ien h
F
S
t
port1.0.10 port1.0.20 E
port1.0.1:
primary A
port1.0.22 port1.0.1
port1.0.22
Ma Nod
Tr Nod
P
an e
ste e
sit
r
B S
port1.0.2 EPSR Domain port1.0.2:
control VLAN: 100 secondary
data VLANs: 50, 51
port1.0.2
Tr Nod
Tr Nod
port1.0.2
an e
an e
sit
port1.0.1
sit
D
port1.0.22 C port1.0.22
port1.0.1
port1.0.20
Cl witc
port1.0.10
ien h
S
Cl witc
t
ien h
S
G
t
H
Conrol VLAN
Data VLANs
P Primary Port
S Secondary Port
Step 1. Configure the master node (switch A) for the EPSR domain.
awplus(config)#vlan database
awplus(config-vlan)#vlan 100 name epsr-control
awplus(config-vlan)#vlan 50 name data-c1
awplus(config-vlan)#vlan 51 name data-c2
awplus(config-vlan)#interface port1.0.22
awplus(config-if)#switchport access vlan 50
awplus(config-if)#switchport vlan-stacking customer-edge-port
awplus(config-if)#interface port1.0.1-port1.0.2
awplus(config-if)#switchport mode trunk
awplus(config-if)#switchport trunk allowed vlan add 100,50,51
awplus(config-if)#switchport trunk native vlan none
awplus(config-if)#switchport vlan-stacking provider-port
awplus(config-if)#epsr configuration
awplus(config-epsr)#epsr awplus mode master controlvlan 100 primaryport
port1.0.1
awplus(config-epsr)#epsr awplus datavlan 50-51
Enable EPSR:
Each of the transit nodes has the same EPSR configuration in this example.
awplus(config)#vlan database
awplus(config-vlan)#vlan 100 name epsr-control
awplus(config-vlan)#vlan 50 name data-c1
awplus(config-vlan)#vlan 51 name data-c2
Note: In this example the control VLAN is also the nested VLAN.
awplus(config-vlan)#interface port1.0.22
awplus(config-if)#switchport access vlan 50
awplus(config-if)#switchport vlan-stacking customer-edge-port
awplus(config-if)#interface port1.0.1-port1.0.2
awplus(config-if)#switchport mode trunk
awplus(config-if)#epsr configuration
awplus(config-epsr)#epsr awplus mode transit controlvlan 100
awplus(config-epsr)#epsr awplus datavlan 50-51
Enable EPSR:
Each of the transit nodes has the same EPSR configuration in this example.
awplus(config)#vlan database
awplus(config-vlan)#vlan 100 name epsr-control
awplus(config-vlan)#vlan 50 name data-c1
awplus(config-vlan)#vlan 51 name data-c2
Note: In this example the control VLAN is also the nested VLAN.
awplus(config-vlan)#interface port1.0.22
awplus(config-if)#switchport access vlan 51
awplus(config-if)#switchport vlan-stacking customer-edge-port
awplus(config-if)#interface port1.0.1-port1.0.2
awplus(config-if)#switchport mode trunk
awplus(config-if)#switchport trunk allowed vlan add 100,50,51
awplus(config-if)#switchport trunk native vlan none
awplus(config-if)#switchport vlan-stacking provider-port
awplus(config-if)#epsr configuration
awplus(config-epsr)#epsr awplus mode transit controlvlan 100
awplus(config-epsr)#epsr awplus datavlan 50-51
Enable EPSR:
awplus(config)#vlan database
awplus(config-vlan)#vlan 20 name customer20
awplus(config-vlan)#interface vlan20
awplus(config-if)#ip address 192.168.20.10/24
awplus(config-if)#interface port1.0.20
awplus(config-if)#switchport mode trunk
awplus(config-if)#switchport trunk allowed vlan add 20
awplus(config-if)#end
awplus(config)#vlan database
awplus(config-vlan)#vlan 200 name customer200
awplus(config-vlan)#interface vlan200
awplus(config-if)#ip address 192.168.200.1/24
awplus(config-if)#interface port1.0.10
awplus(config-if)#switchport mode trunk
awplus(config-if)#switchport trunk allowed vlan add 20
awplus(config-if)#end
awplus(config)#vlan database
awplus(config-vlan)#vlan 20 name customer20
awplus(config-vlan)#interface vlan20
awplus(config-if)#ip address 192.168.20.1/24
awplus(config-if)#interface port1.0.20
awplus(config-if)#switchport mode trunk
awplus(config-if)#switchport trunk allowed vlan add 20
awplus(config-if)#end
awplus(config)#vlan database
awplus(config-vlan)#vlan 200 name customer200
awplus(config-vlan)#interface vlan200
awplus(config-if)#ip address 192.168.200.10/24
awplus(config-if)#interface port1.0.10
awplus(config-if)#switchport mode trunk
awplus(config-if)#switchport trunk allowed vlan add 200
awplus(config-if)#end
show epsr
show epsr=awplus
---------------------------------------------------------------------
show epsr
-------------------------------------------------------------------------------
show epsr=awplus
-------------------------------------------------------------------------------
The following ports report that they are down immediately or within a few milliseconds, which leads
to an EPSR recovery time of 50 to 100ms:
10G ports
However, for tri-speed copper RJ-45 ports operating at 1000M, there is a short delay before the port
reports that it is down. For almost all networks, this slight delay in recovery has no practical effect.
However, for networks with extremely stringent failover requirements, we recommend using fiber
1000M ports instead of copper.
When IGMP snooping is enabled on a VLAN, and EPSR changes the underlying link layer topology
of that VLAN, this can interrupt multicast data flow for a significant length of time. Query solicitation
prevents this by monitoring the VLAN for any topology changes. When it detects a change, it
generates a special IGMP Leave message known as a Query Solicit, and floods the Query Solicit
message to all ports. When the IGMP Querier receives the message, it responds by sending a
General Query. This refreshes snooped group membership information in the network.
Query solicitation functions by default (without you enabling it) on the EPSR master node. By
default, the master node always sends a Query Solicit message when the topology changes.
On other switches in the network, the query solicitation is disabled by default, but you can enable it
by using the command:
If you enable query solicitation on an EPSR transit node, both that node and the master node send a
Query Solicit message.
Once the Querier receives the Query Solicit message, it sends out a General Query and waits for
responses, which update the snooping information throughout the network. If necessary, you can
reduce the time this takes by tuning the IGMP timers, especially the general query messages using
the ip igmp query-interval command.
Query solicitation also works with networks that use Spanning Tree (STP, RSTP, or MSTP).
To prevent this kind of flooding, the AlliedWare Plus OS has an IGMP Query-Hold Interval. This is the
time, starting from the last Query sent, that an IGMP Querier refrains from sending any more IGMP
Queries. You can configure this time period on each VLAN interface, using the command:
where <100-5000> is the time in milliseconds for the hold interval. The default is 500 milliseconds.
This hold time is always enabled, and does not require Query Solicitation to be enabled.
We recommend that you leave queue 7 as the highest priority queue, leave it using strict priority
scheduling, and only send essential control traffic to it.
In the unlikely event that this is impossible, you can increase the failover time so that the master
node only changes the ring topology if several Health messages in a row fail to arrive. By default, the
failover time is set to two seconds, which means that the master node decides that the ring is down
if two Health messages in a row fail to arrive.
Once a fault in the ring or node has been rectified, the Master sends a Ring-Up-Flush message to
the Transit nodes telling them to change their port states from blocking to forwarding (if necessary)
and to delete entries in their forwarding databases. This restores normal conditions and allows data
to flow again.
This flushing of Layer 2 entries could take some time for EPSR rings with large numbers of data
VLANs. To help reduce latency caused during EPSR topology changes you can use the command:
It allows you to set how EPSR flushes Layer 2 entries when a topology change occurs. It can be
configured to flush all Layer 2 entries on its EPSR interfaces or only flush the Layer 2 entries on its
EPSR data VLANs.
Select the interface parameter as the flush-type to help reduce latency, as this type of flushing is
quicker and less granular than flushing per data VLAN.
Note that it will require relearning on any VLANs that are on an EPSR interface but not part of the
EPSR configuration.
show epsr
EPSR Information
---------------------------------------------------------------------
Name ........................ test
Mode .......................... Master
Status ........................ Enabled
State ......................... Complete
Control Vlan .................. 1000
Data Vlan(s) .................. 2
Primary Port .................. port1.0.1
Primary Port Status ........... Forwarding
Secondary Port ................ port1.0.2
Secondary Port Status ......... Blocked
Hello Time .................... 1 s
Failover Time ................. 2 s
Ring Flap Time ................ 0 s
Trap .......................... Enabled
---------------------------------------------------------------------
EPSR Information
---------------------------------------------------------------------
Name ........................ domain1
Mode .......................... Master
Status ........................ Enabled
State ......................... Failed
Control Vlan .................. 1000
Data VLAN(s) .................. 2
Primary Port .................. port1.0.1
Primary Port Status ........... Forwarding
Secondary Port ................ port1.0.2
Secondary Port Status ......... Forwarding
Hello Time .................... 1 s
Failover Time ................. 2 s
Ring Flap Time ................ 0 s
Trap .......................... Enabled
---------------------------------------------------------------------
EPSR Information
---------------------------------------------------------------------
Name ........................ test
Mode .......................... Transit
Status ........................ Enabled
State ......................... Links-Up
Control Vlan .................. 1000
Data VLAN(s) .................. 2
First Port .................... port1.0.1
First Port Status ............. Forwarding
First Port Direction .......... Upstream
Second Port ................... port1.0.2
Second Port Status ............ Forwarding
Second Port Direction ......... Downstream
Trap .......................... Enabled
Master Node ................... 00-00-cd-28-06-19
---------------------------------------------------------------------
SNMP Traps
From Software Version 5.3.1 onwards, you can use SNMP traps to notify you when events occur in
the EPSR ring.
The EPSR Group has the object identifier prefix epsrv2 (module 536), and contains a collection of
objects and traps for monitoring EPSR states.
Counters
The EPSR counters record the number of EPSR messages that the CPU received and transmitted.
To display the counters, use the command:
EPSR Counters
---------------------------------------------------------------------
Name: domain1
Receive: Transmit:
Total EPSR Packets 1093 Total EPSR Packets 1093
Health 1092 Health 1092
Ring Up 1 Ring Up 1
Ring Down 0 Ring Down 0
Link Down 0 Link Down 0
Invalid EPSR Packets 0
--------------------------------------------------------------------
The node has generated 1093 EPSR packets (and sent them out its primary port) and has received
the same number of EPSR packets (on its secondary port). However, it is very common to see a few
Link Down, Ring Down, and Ring Up entries in the output of a ring that has never been in a Failed
state. These messages are produced when you first enable EPSR, if some ring nodes establish
before others.
EPSR Counters
---------------------------------------------------------------------
Name: domain1
Receive: Transmit:
Total EPSR Packets 1425 Total EPSR Packets 2
Health 1421 Health 0
Ring Up 2 Ring Up 0
Ring Down 0 Ring Down 0
Link Down 0 Link Down 2
Invalid EPSR Packets 0
---------------------------------------------------------------------
Here, the transit node has received 1421 Health messages, which it will have forwarded on if its
ports were up. These messages do not show in the transmit counters because they are transmitted
by the switching hardware, not the CPU. The node has also generated two Link-Down messages,
indicating that on two separate occasions one of its links has gone down.
Debugging
This section walks you through the EPSR debugging output as links go down and come back up
again. The debugging output comes from the ring in "Example 1: A Basic Ring" on page 18. The
output shows what happened when we took down two separate links in turn:
First, the link between the master node’s primary port and transit node B
awplus#terminal monitor
awplus#debug epsr all
The terminal monitor command causes the switch to display terminal logging messages on
the console. By default, debug messages are terminal logging messages. You can change this by
using the log terminal command in global configuration mode. You can see which messages are
saved into each type of log by using the show log config command.
Note: The master node transmits Health messages every second by default. The debugging
displays every message, including all Health messages. Therefore, we recommend that you
capture the debugging output for separate analysis, to make analysis simpler.
Each time the Hello timer expires, the master node sends a Health message out its primary port
(port1.0.1). As long as the ring is in a state of Complete, it receives each Health message again on its
secondary port (port1.0.2). In the System field, this output shows the MAC address of the source of
the message—the master node in this case.
The master node continues sending Health messages, and increments the Hello Sequence number
with each message. If all nodes and links in the ring are intact, these Health messages are the only
debugging output you see.
.
.
.
13:52:10 EPSR[1296]: EPSR: epsrHelloTimeout: EPSR awplus Hello Timer expired
13:52:10 EPSR[1296]: EPSR: port1.0.1 Tx:
13:52:10 EPSR[1296]: EPSR: 00e02b00 00040000 cd240331 8100e3e8 005caaaa 0300e02b
13:52:10 EPSR[1296]: EPSR: 00bb0100 00540545 00000000 0000cd24 0331990b 00400105
13:52:10 EPSR[1296]: EPSR: 03e80000 00000000 cd240331 00010002 0100207b 00000000
13:52:10 EPSR[1296]: EPSR: port1.0.1 Tx:
13:52:10 EPSR[1296]: EPSR: ----------------------------------------------------------
13:52:10 EPSR[1296]: EPSR: TYPE = HEALTH STATE = COMPLETE
13:52:10 EPSR[1296]: EPSR: CTRL VLAN = 1000 SYSTEM = 00-00-cd-24-03-31
13:52:10 EPSR[1296]: EPSR: HELLO TIME = 1 FAIL TIME = 2
13:52:10 EPSR[1296]: EPSR: HELLO SEQ = 8420
13:52:10 EPSR[1296]: EPSR: ----------------------------------------------------------
13:52:10 EPSR[1296]: EPSR: port1.0.2 Rx:
13:52:10 EPSR[1296]: EPSR: 00e02b00 00040000 cd240331 810003e8 005caaaa 0300e02b
13:52:10 EPSR[1296]: EPSR: 00bb0100 00540545 00000000 0000cd24 0331990b 00400105
13:52:10 EPSR[1296]: EPSR: 03e80000 00000000 cd240331 00010002 0100207b 00000000
13:52:10 EPSR[1296]: EPSR: port1.0.2 Rx:
13:52:10 EPSR[1296]: EPSR: ----------------------------------------------------------
13:52:10 EPSR[1296]: EPSR: TYPE = HEALTH STATE = COMPLETE
13:52:10 EPSR[1296]: EPSR: CTRL VLAN = 1000 SYSTEM = 00-00-cd-24-03-31
13:52:10 EPSR[1296]: EPSR: HELLO TIME = 1 FAIL TIME = 2
13:52:10 EPSR[1296]: EPSR: HELLO SEQ = 8420
13:52:10 EPSR[1296]: EPSR: ----------------------------------------------------------
The link between the master node’s primary port and the neighboring transit node goes down.
Therefore, the master node detects that its primary port (port1.0.1) has gone down.
Step 4. The master node receives a Link-Down message on its secondary port.
The master node receives a Link-Down message on its secondary port (port1.0.2) from transit node
B, which is at the other end of the broken link.
In the System field, this output shows the MAC address of the source of the message—the transit
node in this case.
The master switch responds to the break in the ring by sending a Ring-Down-Flush-FDB message,
which tells each transit node to learn the new topology. The master node also unblocks its
secondary port for the data VLAN (vlan2), flushes its FDB, sends an SNMP trap, and changes the
EPSR state to Failed. The master node sends the Ring-Down-Flush-FDB message only out its
secondary port, because the link between the primary port and the neighboring transit node is
down.
The Hello timer expires, which would normally trigger the master node to send a Health message out
the primary port. However, the link between the primary port and the neighboring transit node is
down, so the master node does not send the Health message.
The primary port comes back up. The master node immediately blocks that port for vlan2 to prevent
a loop.
The Hello timer expires again. Port1.0.1 is now up, so this time the master node sends a Health
message. The Health message shows that the EPSR state is Failed.
The Hello Sequence number increments from the number it was before the primary port went down,
because the master node could not transmit Health messages while the port was down.
Step 9. The master node receives the Health message on its secondary port.
The master node receives the Health message on its secondary port (port1.0.2). This tells it that all
links on the ring are up again.
Step 10. The master node returns the ring to a state of Complete.
The master node blocks its secondary port for the data VLAN, unblocks its primary port, transmits a
Ring-Up-Flush-FDB message, flushes its FDB, sends a trap, and changes the EPSR state to
Complete.
Step 11. The master node receives the Ring-Up-Flush-FDB message on port1.0.2.
The master node receives the Ring-Up-Flush-FDB message back on its secondary port, because
the packet traversed the whole ring. The master node ignores the message.
Step 12. The master node transmits and receives Health messages.
The master node continues transmitting and receiving Health messages for as long as the ring stays
in a state of Complete.
Note: The following debug was captured at a different time (during a different ring-down event) from
the master node debug in the previous section. This is why the times and hello sequence
numbers do not match.
The transit node receives Health messages on port1.0.1, because that port is connected to the
master node’s primary port. In the System field, this output shows the MAC address of the source of
the message—the master node in this case.
The transit node detects that port1.0.1 (between the transit node and the master node) has gone
down. The transit node flushes its forwarding database, blocks port1.0.1 for the data VLAN (to
prevent a loop from forming when the master node comes back up), sends a Link-Down message
towards the master node, sends a trap, and changes the EPSR state to Link-Down.
In response to the Link-Down message, the master node sends a Ring-Down-Flush-FDB message.
However, this transit node does not need to flush its database—it already did.
The transit node detects that port1.0.1 has come back up. It sends a trap and changes the EPSR
state to Pre-forwarding. It leaves port1.0.1 blocked for vlan2, to make sure there are no loops.
Now that the master node’s primary port is up again, it sends a Health message. Now that the transit
node’s port1.0.1 is up again for the control VLAN, the transit node receives the message. This
demonstrates that the transit node has only blocked port1.0.1 for the data VLAN, not the control
VLAN. EPSR control messages never loop because the master node never forwards them between
its ring ports.
The Hello Sequence number increments from the number it was before the primary port went down,
because the master node could not transmit Health messages while the port was down.
The Health message from the previous step reaches the master node and shows it that all links in
the ring are now up. The master node sends a Ring-Up-Flush-FDB message. When it receives the
message, the transit node unblocks port1.0.1 for vlan2, flushes its FDB, sends a trap, and changes
the state to Link-Up.
This is equivalent to the packet shown in step 10 of the master node debug output for "Link Down
between two transit nodes" on page 53.
The transit node continues receiving Health messages for as long as the ring stays in a state of
Complete.
On transit node B.
Each time the Hello timer expires, the master node sends a Health message out its primary port
(port1.0.1). As long as the ring is in a state of Complete, it receives each Health message again on its
secondary port (port1.0.2).
Step 2. The link between the two transit nodes goes down.
When the link goes down, the master node transmits a Health message but does not receive it on its
secondary port.
Step 3. The master node receives a Link-Down message on its secondary port.
The master node receives a Link-Down message, which tells it that a link in the ring is broken. This
message came from the transit node on one side of the broken link.
The master node receives a Link-Down message from the transit node on the other side of the
broken link. This message arrived after a delay because the ring ports are 1000M ports (see "Ports
and Recovery Times" on page 35). The master node does not take any action in response to this
message, because it already responded to the broken link.
The master node continues sending Health messages out its primary port. It does not receive any of
these at the secondary port, which tells it that the link is still down.
The master node transmits a Health message and receives it at the secondary port. This indicates
that the link is back up.
Now that the ring is back up, the master node blocks its secondary port for the data VLAN, transmits
a Ring-Up-Flush-FDB message, flushes its FDB, sends a trap, and changes the EPSR state to
Complete.
The master node receives the Ring-Up-Flush-FDB message back on its secondary port, because
the packet traversed the whole ring. The master node ignores the message.
Step 10. The master node transmits and receives Health messages.
The master node continues transmitting and receiving Health messages for as long as the ring stays
in a state of Complete.
Note: The following debug was captured at a different time (during a different ring-down event) from
the master node debug in the previous section. This is why the times and hello sequence
numbers do not match.
The transit node receives Health messages on port1.0.1, because that port is connected to the
master node’s primary port. The message shows that the ring state is Complete.
Also note the Hello sequence number, which is very close to the maximum of 65535. Once the
number reaches 65535, it restarts at zero.
Step 2. The link between the two transit nodes goes down.
The transit node receives Health message 29. At this stage, the message does not indicate that
anything is wrong. However, between messages 28 and 29, the link went down. This means that
message 29 will not make it back to the master node.
The hello sequence counter has wrapped and is now counting up from zero.
In the meanwhile, the master node has received a Link-Down message from the switch at the other
end of the broken link. Therefore, the master node realities that the ring is broken and acts
accordingly. As part of the recovery process, the master node sends a Ring-Down-Flush-FDB
message. The transit node receives this message and flushes its forwarding database.
The transit node realities that its port is down, sends a Link-Down message, sends a trap, and
changes its state to Link-Down. The transit node sends this message some time after the link
actually went down, because the ring ports are 1000M ports (see "Ports and Recovery Times" on
page 35). By this stage the ring has already changed topology to restore traffic flow. The master
node detected the link failure by receiving a Link-Down message from the other side of the link.
The transit node receives Health messages from the master node. These have a state of Failed,
which shows that the ring is still broken.
The transit node detects that the broken link has come back up. It blocks the port to prevent a loop
from occurring, sends a trap, and changes the EPSR state to Pre-forwarding.
The transit node receives another Health message. This message will make it back to the master
node’s secondary port, because the link between the two transit nodes is now up.
The transit node receives a Ring-Up-Flush-FDB message, which indicates that the master node
knows that all links in the ring are up again. The transit node unblocks port1.0.2 for vlan2, flushes its
FDB, sends a trap, and changes state to Link-Up.
The transit node continues receiving Health messages for as long as the ring stays in a state of
Complete.
This is equivalent to the packet shown on step 10 of "Link Down between two transit nodes" on
page 53 of the master node debug output.
Overview
EPSR SuperLoop Prevention (EPSR-SLP) is an enhancement
List of terms:
to the existing EPSR feature in AlliedWare Plus. EPSR-SLP
SuperLoop
prevents “SuperLoops” forming in certain EPSR multi-ring
Multiple rings, each with their
topologies. EPSR-SLP was introduced in AlliedWare Plus
own EPSR Domains, may be
Version 5.4.2. connected together in a
topology. If these domains
What is a SuperLoop? share the same set of Data
To achieve redundancy, you may wish to deploy multiple EPSR VLANs, and also share a
rings that have the same set of protected VLANs. If these rings common segment, then the
share a common segment, and that common segment fails, a failure of that common
segment leads to a
loop forms. This loop is known as a SuperLoop.
SuperLoop.
Why do SuperLoops occur? Common Segment
In normal EPSR operation (that is, without EPSR-SLP), the A common segment is a link
(or links) in the network that
Masters on both rings separately put their secondary ports into
are shared by two or more
the Forwarding state when they detect a link going down. As
rings, and which has a
illustrated in the diagram on the following page, this creates a common set of Data VLANs.
Forwarding loop. SLP
SuperLoop Prevention.
Example diagram
The following diagram shows how EPSR without the EPSR-
SLP enhancement can lead to a SuperLoop. It also shows the
topology of the resultant SuperLoop.
2. The transit nodes at each end of the common link send Link Down messages to both master
nodes.
4. As shown in the lower half of the diagram, this results in a loop. Data circulates continuously
around this loop, congesting the network.
2. It ensures that common segment transit nodes send Link Down messages only to the Master of
the highest priority ring.
3. When a link in a common segment goes down, only the Master of the highest priority ring opens
its secondary port.
2. The transit nodes at each end of the common link send Link Down messages only to their higher-
priority master nodes.
5. The end result is a new topology in which all nodes retain connectivity, but one link is blocked to
prevent packet storming.
For information about how EPSR detects outages in a node, or a link in the ring, see "Detecting a
fault" on page 9.
For information about the fault recovery actions EPSR takes, see "Recovering from a fault" on
page 9.
For information about Enhanced Recovery, see "Enhanced Recovery" on page 10.
Note: Enhanced Recovery behaviour is the same with EPSR-SLP enabled, however some
differences exist for a master node. For more information, see "EPSR Enhanced Recovery
when SLP is enabled" on page 71.
A value of 1 represents the lowest priority level, and 127 the highest priority. Assigning a priority of 1
or greater enables EPSR-SLP.
Note: A value of 0 effectively disables EPSR-SLP, returning the switch to standard EPSR behaviour.
Common segments
A common segment is a link in the network that is shared by two or more rings, and which has a
common set of data VLANs. In other words, the data VLANs passing through the common segment
also extend into both the rings that share the segment.
How the switch applies SuperLoop Protection depends on the role of the node within the ring:
The only situation in which the master node does unblock its secondary port is if:
The Link Down message arrives before the failover timer expires.
Example
In this example:
Master A is the higher-priority master node with a priority level of 10. Therefore, transit nodes send
Link Down messages to Master A.
Master A receives the Link Down messages before its failover timer expires. This means it will:
Master B does not receive Link Down messages. Therefore its Failover timer will expire without
having received any Link Down messages. So, it will:
Timing is important:
Link Down messages are normally received from transit nodes before Failover timer expiry. In this
case, the secondary port transitions to the Forwarding state.
If a Link Down message is received after the Failover timer expires, the secondary port remains
in the Blocking state. see "Transit node behavior" on page 69
This behavior can sometimes result in cases where the secondary port seems to be unexpectedly
blocked. See "High priority master reboot when ring is down" on page 80.
A transit node that is connected to a common segment is affected by its EPSR priority. It changes its
behavior in the following ways:
It compares the EPSR priority of each of the instances that share the common segment.
If the common segment fails, the transit node only issues a Link Down message on the instance
with the highest priority.
This is illustrated in the example diagram under "Master node behavior" on page 67. At step 2 in this
diagram, the transit nodes on the common link send Link Down messages only to the higher-priority
Master.
With SuperLoop Protection, there are cases where this does not happen. Specifically if:
One of the ring ports that went down is connected to a common segment,
and
The ring port that recovers is not connected to the common segment, and is not connected to
the highest priority EPSR instance that shares the common segment, then the newly recovered
ring port is not transitioned to forwarding.
This avoids the risk of SuperLoops that could form in some topologies.
Example Consider the case illustrated in the diagram below. If the switch at the right-hand end of the
common segment is power cycled while the common segment is down, then when the switch
comes up, the port that connects to EPSR instance 2 will remain Blocking.
The reason for this is that this transit node cannot know for certain whether the secondary port on
the Master switch in the lower-priority ring is still Blocking. If that Master's secondary port is not
Blocking, then if the transit node puts its port into Forwarding, a SuperLoop would form. Hence, to
be safe, that port remains Blocking.
The following sections address the behavior of EPSR Enhanced Recovery on SuperLoop-enabled
nodes.
Note: Enhanced Recovery should not be used in EPSR-SLP topologies that include 3 or more rings.
For more information, see "Caution about Enhanced Recovery with EPSR-SLP topologies
with 3 or more rings" on page 78.
However, there are some over-riding behaviors that can cause a port to remain in the Blocking state:
If the instance that receives permission to forward is not the highest priority on a common
segment, the port may still be subject to the physical blocking of a higher priority instance. For
more information, see "Physical and logical port control" on page 71.
The transit node behaviour explained in "Transit node behaviour if the other port is still down" on
page 70.
On nodes that have ports connected to common segments, only the highest priority EPSR instance
has physical control of those ports. The other EPSR instances are deemed to have logical control of
the common segment ports.
The EPSR instance that has physical control of the ring ports is the one that sets the port states, for
example blocking, pre-forwarding or forwarding.
The state that the other, lower-priority instances that share the ring ports would put the ports into, if
they had control of them, is referred to as the logical state of the ports for those instances. This
logical state has no effect on the operation of the ports. The logical state is tracked mostly so that
you can check that those other instances are maintaining internal consistency, and are making the
correct state transitions.
If the EPSR instance that has physical control of a port is physically blocking the port, it is also
blocking access to that port for all other instances as well.
You can see whether a port is physically or logically blocking by using the show epsr command:
EPSR Information
---------------------------------------------------------------------
Name ........................ B
Mode .......................... Master
Status ........................ Enabled
State ......................... Idle
Control Vlan .................. 6
Data VLAN(s) .................. 40
Interface Mode ................ Channel Groups Only
Primary Port .................. sa2
Status ...................... Down
Is On Common Segment ........ No
Blocking Control ............ Physical << Here it is physical
Secondary Port ................ sa1
Status ...................... Down
Is On Common Segment ........ No
Blocking Control ............ Physical << Here it is physical
Hello Time .................... 1 s
Failover Time ................. 2 s
Ring Flap Time ................ 0 s
Trap .......................... Enabled
Enhanced Recovery ............. Enabled
Priority ...................... 5
---------------------------------------------------------------------
Example The following example illustrates the distinction between physical and logical control.
On the common segment, only the highest priority EPSR instance has physical control of the ports.
So, when a common segment port fails, only the highest priority instance on that common segment
physically blocks the port. Other instances on the common segment put their ring ports into a
logical blocking state.
When the link goes up again, the port is initially held in the Pre-forwarding state. While in the Pre-
forwarding state, the highest priority instance is physically blocking. This also blocks all other
instances on the port.
Once the highest priority Master has put its secondary port into the Blocking state, it can inform the
transit nodes attached to the common segment to transition their Pre-forwarding ports to
Forwarding.
At that point, these ports will also go to Forwarding for the other EPSR instance. So, when the
physical blocking is removed, the logical blocking is also removed.
The key point here is that it is packets from the highest priority Master that determine when the ports
can return to the Forwarding state. Therefore, it must be the highest priority EPSR instance that has
control of this.
Parameters
explained PARAMETER MEANING
Common Seg Port The ring port that identifies the common segment
EPSR Instance Corresponds to IMASK/EMASK fields on the IMASK table. Shows which
port numbers packets will be matched on.
Mode The mode in which the EPSR instance is configured to operate - either
Master or Transit
Port Type The type of ring port in the instance - Primary or Secondary for a master
node; First or Second for a transit node
Phys Ctrl of Port Whether the instance has physical control of the common ring port's
blocking in the instances' data VLANs
Ring Port Status Whether the EPSR instance's ring port is currently in the Forwarding,
Blocking, or Link Down state
Abbreviations:
M = Master node
T = Transit node
C = is on a Common Segment with other instances
P = instance on a Common Segment has physical control of the shared
port's
data VLAN blocking
Blocked (SLP) = master secondary ring port is blocked for EPSR-SLP
Parameters
explained PARAMETER MEANING
Mode The mode in which the EPSR instance is configured to operate - either Master
(M) or Transit (T)
PARAMETER MEANING
Primary/1st Port For a master node, this is the EPSR instance's primary ring port. For a transit
Status node, this is the EPSR instance’s first port.
C indicates the ring port is on a common segment with other instances. P
indicates the instance has physical control of the shared port's data VLAN
blocking.
PARAMETER MEANING
This command checks the configuration of a specified EPSR instance, or all EPSR instances. If an
instance is enabled, this command will check for the following errors or warnings:
Some of the data VLANs are not assigned to the ring ports.
The instance is a master that shares a common segment with a higher priority instance.
The instance is a master that shares a common segment with another master.
To check the configuration of all EPSR instances and display the results, use the command:
All member nodes of an EPSR-SLP domain should have a consistent EPSR priority value.
On common segment nodes, ensure that all the different instances have unique priorities.
When configuring multiple EPSR-SLP instances on a common segment node, the switch
performs checks to ensure that all instances on any identified common segment ports share the
same set of data VLANs. If any of these checks fail, the switch does not accept the command,
and returns an error message.
Either remove the native VLAN from ring ports, or ensure that the native VLAN is specified as an
EPSR Data VLAN.
Each Master’s port that connects to the common segment must be configured as the primary
port.
Note: Remember you can check EPSR configuration by using the show commands, see "EPSR
show commands" on page 74.
Tr Nod
an e
Tr Nod
sit
an e
sit
S
Tr ans it N
an it o
Tr rans
Common Segment
sit No de
T
Ma ste er N
No de
Ma ast
ste r N o
de
r N od de
P
od e
e
S
S
Domain2 / priority = 70
Tr Nod
Tr Nod
an e
an e
sit
sit
Domain3 / priority = 40
Tr Nod
Tr Nod
an e
an e
sit
sit
Master nodes can be placed on a common segment, but it is generally better not to.
If a master node is located on a common segment, the port connecting to the common segment on
the master node's Master instance must be a primary port.
This is to avoid inappropriate physical blocking. A master node’s secondary ports must not connect
to the common segment, because in normal operation secondary ports are blocking. In the case of
the highest priority Master, this would result in physical blocking, which would unnecessarily prevent
lower priority domains from having access to the common segment.
a device on the common segment acting as the master node for multiple EPSR rings,
and you disable the rings and later re-enable them (e.g., when doing EPSR configuration
changes),
then you must re-enable the ring with the highest EPSR SLP priority last.
If you do not re-enable the rings in that order, the primary port may end up permanently in a blocking
state. If the primary port does end up in a permanent blocking state, recover by disabling and re-
enabling the EPSR ring that has the highest priority.
Therefore, if an EPSR instance has to share any VLANs with other EPSR instances, then EPSR-SLP
must be enabled on all those instances.
Caution about Enhanced Recovery with EPSR-SLP topologies with 3 or more rings
Any EPSR-SLP topology that includes three or more rings with two or more common segments (i.e.
ladder topology) must not have Enhanced Recovery enabled on the common segment nodes.
Referring to the diagram below, the problem with having Enhanced Recovery enabled is that a
SuperLoop will form if the following sequence of events occurs:
1. Both common segments go down (I.e. the common segments between switches C and D; G and
H).
2. The common segment between switches G and H becomes available again, but the other
common segment (between switches C and D) remains down.
Let’s look at the sequence of events that will cause a SuperLoop to form in this scenario.
1. When the common link between switches G and H is repaired, they send LinkForwardRequest
messages to their highest priority Master, which is switch E.
2. Because the link between C and D is still down, the healthcheck packets that switch E sends do
not arrive back on its secondary port. So, switch E sees the ring as down, and therefore permits
switches G and H to transition their ports on the common segment to Forwarding.
3. At this point, because the secondary ports of Master switches E and A are still Forwarding, a
SuperLoop forms around the path A->B->D->F->H->G->E->C->A
When an EPSRing is broken, the highest priority master node’s secondary port enters the Failed
state. The master node’s secondary port must receive a Link Down message before the Failover
timer expires, in order to re-enter the Forwarding state.
If the master node has rebooted while the ring was in the Failed state, then after this reboot the
master node cannot receive new Link Down messages. As such, it enters the Blocking state. The
same situation occurs when the secondary port has its state toggled.
Furthermore, after a reboot, the master node cannot judge whether it is safe to allow its secondary
port to forward. For example, it does not know if:
In some cases this can lead to a split ring. If you cannot quickly repair the common segment, you
can manually intervene, using the following techniques.
The split ring can be described as a 2-ring topology segmented into two isolated sides of the failed
common segment. This split-ring is not automatically restored until the common segment comes
back up. You can manually fix this, but the resulting configuration is not without risk. See "Manual
fix" on page 81.
Manual fix
You can temporarily allow connectivity by setting the state of the secondary port to Forwarding:
1. disable EPSR
2. disable SuperLoop
3. re-enable EPSR
You should return to your normal configuration before the common segment is repaired, using the
following instructions.
EPSR Information
---------------------------------------------------------------------
Name ........................ A
Mode .......................... Master
Status ........................ Enabled
State ......................... Failed
Control VLAN .................. 5
Data VLAN(s) .................. 40
Interface Mode ................ Channel Groups Only
Primary Port .................. sa1
Status ...................... Forwarding
Is On Common Segment ........ No
Blocking Control ............ Physical
Secondary Port ................ sa2
Status ...................... Forwarding
Is On Common Segment ........ No
Blocking Control ............ Physical
Hello Time .................... 1 s
Failover Time ................. 2 s
Ring Flap Time ................ 0 s
Trap .......................... Enabled
Enhanced Recovery ............. Enabled
Priority ...................... 0 [SuperLoop prevention disabled]
---------------------------------------------------------------------
3. Ensure that the fibre repairers notify you when the common segment is close to reconnection.
Before it is actually re-connected, you must enable EPSR, and enable SuperLoop at its previous
priority setting:
4. As shown above, the EPSR instance’s secondary port is Blocked until the common segment is
reconnected.
Note: It is very important to enable SuperLoop before the common segment is reconnected.
Otherwise, the network is at risk of another, possibly longer SuperLoop storm occurring
during reconnection.
C613-22041-00 REV E
NETWORK SMARTER
North America Headquarters | 19800 North Creek Parkway | Suite 100 | Bothell | WA 98011 | USA | T: +1 800 424 4284 | F: +1 425 481 3895
Asia-Pacific Headquarters | 11 Tai Seng Link | Singapore | 534182 | T: +65 6383 3832 | F: +65 6383 3830
EMEA & CSA Operations | Incheonweg 7 | 1437 EK Rozenburg | The Netherlands | T: +31 20 7950020 | F: +31 20 7950021
alliedtelesis.com
© 2021 Allied Telesis, Inc. All rights reserved. Information in this document is subject to change without notice. All company names, logos, and product designs that are trademarks or registered trademarks are the property of their respective owners.