Page MenuHomePhabricator

msw-c5-eqiad offline
Closed, ResolvedPublic

Description

At 13:05 UTC today msw-c5-eqiad went offline, port on msw1-eqiad went hard down:

Oct 20 13:05:41  msw1-eqiad mib2d[2003]: SNMP_TRAP_LINK_DOWN: ifIndex 551, ifAdminStatus up(1), ifOperStatus down(2), ifName ge-0/0/22

This has broken management network access to the following devices:

DNS NameIP
an-conf1002.mgmt.eqiad.wmnet10.65.5.119
an-db1002.mgmt.eqiad.wmnet10.65.1.53
an-test-worker1002.mgmt.eqiad.wmnet10.65.0.69
cloudcontrol1005.mgmt.eqiad.wmnet10.65.4.188
cloudmetrics1001.mgmt.eqiad.wmnet10.65.2.112
db1120.mgmt.eqiad.wmnet10.65.1.5
db1145.mgmt.eqiad.wmnet10.65.1.139
db1146.mgmt.eqiad.wmnet10.65.1.140
db1168.mgmt.eqiad.wmnet10.65.0.181
db1169.mgmt.eqiad.wmnet10.65.0.187
db1181.mgmt.eqiad.wmnet10.65.0.218
db1189.mgmt.eqiad.wmnet10.65.3.2
dbproxy1018.mgmt.eqiad.wmnet10.65.2.173
dbproxy1019.mgmt.eqiad.wmnet10.65.2.174
dbproxy1020.mgmt.eqiad.wmnet10.65.2.175
dbproxy1021.mgmt.eqiad.wmnet10.65.2.176
es1022.mgmt.eqiad.wmnet10.65.4.146
ganeti1010.mgmt.eqiad.wmnet10.65.5.105
ganeti1024.mgmt.eqiad.wmnet10.65.1.208
gitlab-runner1003.mgmt.eqiad.wmnet10.65.2.91
kubernetes1012.mgmt.eqiad.wmnet10.65.4.194
mw1484.mgmt.eqiad.wmnet10.65.2.216
pc1013.mgmt.eqiad.wmnet10.65.1.189
ps1-c5-eqiad.mgmt.eqiad.wmnet10.65.0.52
wdqs1013.mgmt.eqiad.wmnet10.65.4.185

DC-Ops can we get someone to investigate the issue? Hopefully we can get it back up, not sure if we have a suitable replacement on site.

Event Timeline

cmooney triaged this task as High priority.Oct 20 2022, 1:31 PM
cmooney created this task.

msw-c5-eqiad unresponsive. utilized previous decom switch to bring management connection back online. netbox updated

cmooney closed this task as Resolved.EditedOct 20 2022, 2:09 PM
cmooney claimed this task.

Awesome @Jclark-ctr thanks for the speedy response!

I can confirm port is back up:

Oct 20 13:58:27  msw1-eqiad mib2d[2003]: SNMP_TRAP_LINK_UP: ifIndex 551, ifAdminStatus up(1), ifOperStatus up(1), ifName ge-0/0/22

MAC addresses are learnt:

cmooney@msw1-eqiad> show ethernet-switching table interface ge-0/0/22.0  

MAC database for interface ge-0/0/22.0

MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static, C - Control MAC
           SE - statistics enabled, NM - non configured MAC, R - remote PE MAC, O - ovsdb MAC)


Ethernet switching table : 38 entries, 38 learned
Routing instance : default-switch
    Vlan                MAC                 MAC         Age    Logical                NH        RTR 
    name                address             flags              interface              Index     ID
    default             00:0a:9c:62:ec:e4   D             -   ge-0/0/22.0            0         0       
    default             14:58:d0:47:27:b0   D             -   ge-0/0/22.0            0         0       
    default             2c:ea:7f:3d:61:cf   D             -   ge-0/0/22.0            0         0       
    default             2c:ea:7f:68:ab:f3   D             -   ge-0/0/22.0            0         0       
    default             2c:ea:7f:68:e4:bc   D             -   ge-0/0/22.0            0         0       
    default             2c:ea:7f:68:f8:35   D             -   ge-0/0/22.0            0         0       
    default             2c:ea:7f:85:3a:69   D             -   ge-0/0/22.0            0         0       
    default             2c:ea:7f:87:f9:3d   D             -   ge-0/0/22.0            0         0       
    default             2c:ea:7f:8a:6b:6a   D             -   ge-0/0/22.0            0         0       
    default             2c:ea:7f:a7:0c:ed   D             -   ge-0/0/22.0            0         0       
    default             34:73:5a:fb:84:a6   D             -   ge-0/0/22.0            0         0       
    default             4c:d9:8f:66:1d:d3   D             -   ge-0/0/22.0            0         0       
    default             4c:d9:8f:6c:9f:29   D             -   ge-0/0/22.0            0         0       
    default             4c:d9:8f:6c:a5:98   D             -   ge-0/0/22.0            0         0       
    default             4c:d9:8f:6c:a7:b3   D             -   ge-0/0/22.0            0         0       
    default             4c:d9:8f:6c:aa:46   D             -   ge-0/0/22.0            0         0       
    default             4c:d9:8f:6c:b0:7c   D             -   ge-0/0/22.0            0         0       
    default             4c:d9:8f:a6:1c:43   D             -   ge-0/0/22.0            0         0       
    default             4c:d9:8f:af:62:e6   D             -   ge-0/0/22.0            0         0       
    default             4c:d9:8f:af:78:e4   D             -   ge-0/0/22.0            0         0       
    default             4c:d9:8f:c4:a6:43   D             -   ge-0/0/22.0            0         0       
    default             4c:d9:8f:c9:f0:35   D             -   ge-0/0/22.0            0         0       
    default             4c:d9:8f:ca:aa:25   D             -   ge-0/0/22.0            0         0       
    default             58:8a:5a:e8:4a:06   D             -   ge-0/0/22.0            0         0       
    default             b0:4f:13:b0:9a:90   D             -   ge-0/0/22.0            0         0       
    default             b0:4f:13:b4:62:58   D             -   ge-0/0/22.0            0         0       
    default             b0:4f:13:b4:6c:d8   D             -   ge-0/0/22.0            0         0       
    default             b0:4f:13:b9:81:ee   D             -   ge-0/0/22.0            0         0       
    default             b0:4f:13:b9:82:36   D             -   ge-0/0/22.0            0         0       
    default             b0:4f:13:b9:83:b6   D             -   ge-0/0/22.0            0         0       
    default             b0:4f:13:bc:ec:54   D             -   ge-0/0/22.0            0         0       
    default             b0:4f:13:be:0d:2a   D             -   ge-0/0/22.0            0         0       
    default             b0:4f:13:be:17:2a   D             -   ge-0/0/22.0            0         0       
    default             b0:4f:13:be:4f:0a   D             -   ge-0/0/22.0            0         0       
    default             b0:4f:13:be:5a:32   D             -   ge-0/0/22.0            0         0       
    default             b0:4f:13:be:5a:d2   D             -   ge-0/0/22.0            0         0       
    default             b0:4f:13:be:5b:0a   D             -   ge-0/0/22.0            0         0       
    default             d0:8e:79:f4:15:fa   D             -   ge-0/0/22.0            0         0

And affected devices are reachable again:

64 bytes from an-conf1002.mgmt.eqiad.wmnet (10.65.5.119): icmp_seq=1 ttl=62 time=24.2 ms
64 bytes from wmf5066.mgmt.eqiad.wmnet (10.65.1.53): icmp_seq=1 ttl=62 time=24.0 ms
64 bytes from wmf4834.mgmt.eqiad.wmnet (10.65.0.69): icmp_seq=1 ttl=62 time=0.868 ms
64 bytes from cloudcontrol1005.mgmt.eqiad.wmnet (10.65.4.188): icmp_seq=1 ttl=62 time=0.850 ms
64 bytes from wmf4659.mgmt.eqiad.wmnet (10.65.2.112): icmp_seq=1 ttl=253 time=0.456 ms
64 bytes from wmf7363.mgmt.eqiad.wmnet (10.65.1.5): icmp_seq=1 ttl=62 time=0.549 ms
64 bytes from db1145.mgmt.eqiad.wmnet (10.65.1.139): icmp_seq=1 ttl=62 time=0.786 ms
64 bytes from wmf5401.mgmt.eqiad.wmnet (10.65.1.140): icmp_seq=1 ttl=62 time=0.829 ms
64 bytes from db1168.mgmt.eqiad.wmnet (10.65.0.181): icmp_seq=1 ttl=62 time=15.2 ms
64 bytes from wmf5474.mgmt.eqiad.wmnet (10.65.0.187): icmp_seq=1 ttl=62 time=24.3 ms
64 bytes from wmf4970.mgmt.eqiad.wmnet (10.65.0.218): icmp_seq=1 ttl=62 time=23.8 ms
64 bytes from db1189.mgmt.eqiad.wmnet (10.65.3.2): icmp_seq=1 ttl=62 time=0.916 ms
64 bytes from wmf5179.mgmt.eqiad.wmnet (10.65.2.173): icmp_seq=1 ttl=62 time=0.778 ms
64 bytes from dbproxy1019.mgmt.eqiad.wmnet (10.65.2.174): icmp_seq=1 ttl=62 time=24.2 ms
64 bytes from wmf5181.mgmt.eqiad.wmnet (10.65.2.175): icmp_seq=1 ttl=62 time=24.8 ms
64 bytes from dbproxy1021.mgmt.eqiad.wmnet (10.65.2.176): icmp_seq=1 ttl=62 time=24.4 ms
64 bytes from es1022.mgmt.eqiad.wmnet (10.65.4.146): icmp_seq=1 ttl=62 time=0.843 ms
64 bytes from ganeti1010.mgmt.eqiad.wmnet (10.65.5.105): icmp_seq=1 ttl=62 time=23.3 ms
64 bytes from wmf4881.mgmt.eqiad.wmnet (10.65.1.208): icmp_seq=1 ttl=62 time=17.4 ms
64 bytes from wmf4935.mgmt.eqiad.wmnet (10.65.2.91): icmp_seq=1 ttl=62 time=0.902 ms
64 bytes from wmf5384.mgmt.eqiad.wmnet (10.65.4.194): icmp_seq=1 ttl=62 time=0.922 ms
64 bytes from mw1484.mgmt.eqiad.wmnet (10.65.2.216): icmp_seq=1 ttl=62 time=0.860 ms
64 bytes from pc1013.mgmt.eqiad.wmnet (10.65.1.189): icmp_seq=1 ttl=62 time=25.8 ms
64 bytes from ps1-c5-eqiad.mgmt.eqiad.wmnet (10.65.0.52): icmp_seq=1 ttl=253 time=1.04 ms
64 bytes from wmf5341.mgmt.eqiad.wmnet (10.65.4.185): icmp_seq=1 ttl=62 time=0.862 ms
ayounsi reassigned this task from cmooney to Jclark-ctr.
ayounsi subscribed.

Thanks for the quick turnaround!

There is an outstanding diff in Homer:

Changes for 1 devices: ['msw1-eqiad.mgmt.eqiad.wmnet']

[edit interfaces ge-0/0/22]
-   description "Core: msw-c5-eqiad:47 {#1544}";
+   description "Core: WMF4900:47 {#1544}";

This is due to the switch port pointing to the now decom device: https://netbox.wikimedia.org/dcim/interfaces/7795/trace/ and needs to be updated as well.

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy