0% found this document useful (0 votes)
5 views

+ Add/request New Update: 13170581 - ING BANK Umraniye

Uploaded by

mingli.bi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

+ Add/request New Update: 13170581 - ING BANK Umraniye

Uploaded by

mingli.bi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

11/12/21, 3:03 PM 552586

Summary Array vaulted due to no paths available into a DAE


Status Closed Severity 2 - Medium Code Version 5977.1131.1131
Responsible Assigned Engineer Product
Timothy Gaspar Gary Ruby UCODE SSCO
Engineer
Issue Details
Problem Key SFDC number 000522724 Resolution comment
None view
Update
Problem submitter Closure SFDC
Martin Hayes 531191
number

Customer SR # and customer 13170581 - ING BANK Customer Name(s)


ING BANK UMRANIYE
Details name(s) Add UMRANIYE

Notes

+ Add/request new update

Time (SP) Significant Events ( note all these event repeat at various times, but there are too many
to list)
14:44:58 first instance of CD23.28 SSP DMA transfer errors against port 0xD on both DS 1c & 2c
:: first instance of 0B3E.01 SET_DA_CHK_RESET bit set on both DS
:: first instance of D52E IOBS_ABORTED_INTERNAL status received from CDI and we see disk connect
messages
:: first instance of BF3E.54 Drive PHY Hard Reset against multiple drives.
14.44:59 fist instance of C03E.87 DAE override timer set as eses_state < 10 on port 0x17s but >= 6
14:34:01 all ESES structure/information is rebuilt on port 0x17s
14:34:08 first instance of 0322 task 20 failure for reason_code: E_PHYS_REASON_SAS_NO_DAE_CONNECTION
:: first SCSI check condition from port 0x17 with TARGET OPERATING CONDITIONS HAVE CHANGED sense
(06/3F)
14:34:18 second instance of 0322 task 20 failure for reason_code:
E_PHYS_REASON_SAS_NO_DAE_CONNECTION
14:34:28 3rd instance of 0322 task 20 failure for reason_code: E_PHYS_REASON_SAS_NO_DAE_CONNECTION
14:34:38 4th instance of 0322 task 20 failure for reason_code: E_PHYS_REASON_SAS_NO_DAE_CONNECTION
:: first instance of 700B.C6 port 0xD is disabled by task20 on both DS for reason code No DAE
Connection ( on one port)
:: 0B3E.02 CLEAR_DA_CHK_RESET
14:34:41 first instance of BE38.1C to indicate we had a generation code change from 6 to 7.
14:36:21 first SCSI check condition from port 0x13 with POWER ON, RESET, OR BUS DEVICE RESET
OCCURRED sense (06/29)
2019-03-27 14:36:24 B53E.02 on port 0xD indicate that our LP task recovered the link to port 0xD as the LCC as
powered up again
05:39
14:36:32 first 7380.02 from the IMs stating DAE 6 is "inserted" as the LCC recovers temporarily
Gary Ruby 14:40:31 first instance of BC2B.57 errors as logged return code IOBS_DMA_ERROR from CDI when trying
to get page 2
14:41:21 first instance of 700B.C7 port 0xD is disabled by task20 on both DS for reason code ESES
ready Timeout.
14:41:51 Frist instance of BE38.1C indicating we had a gen code change on LCC A, other gencode
changes logged in the DLOG
14:42:05 first instance of DE01.CA stuck config bit set on the IO expander for LCC B
14:43:23 first instance of DE01.CC stuck config bit clear on the IO expander for LCC B
14:44:17 first instance of C03E.81/91 DAE/RAID unavailable after an ESES request timeout across both
DS
14:44:19 first instance of C03E.80/90 DAE/RAID Available again across both DS
14:45:23 last incident of BE38.1C on both DS signifying a gencode 1C->1D
14:45:32 last incident of BE38.99 from LCC A to signify that our ESES request timed out after 3 secs
14:45:34 last instance of C03E.81/91 signifying RAID/DAE unavailability across both DS on port 0x17
14:45:38 02B1.A0, Vault trigger is ON, NTCV is set and the system vault is activated
:: 02b1.14/15 vaulting due to loss of access to the DAE(s)

=== Details ===


Same issue as Final RCA for Customer N.A.C.F. Anseong SR 90804016, reset frequency was longer there though.

Closure SFDC #: '531191' has been set for this OPT.


The Escalation Avoidance for this OPT has been changed from 'Blank' to 'Unavoidable'.
Resolution field was modified for this OPT.
Status For Version 5977.1131.1131 Changed from Assigned to Closed.

✉ SURGE Support
https://surge.isus.emc.com/surge/ViewOPT.php?OPTNum=552586+ 1/7
11/12/21, 3:03 PM 552586

Final RCA
Date: 27-Mar-19
Customer: ING BANK UMRANIYE
SR Number: 13170581
Code Level: 5977_1131/0041
Engineers Involved: Gary Ruby, Tim Gaspar
Escalation Engineer: Kevin Crowley
Surge OPT: 552586

=== DU/DL Closure ===


RCA Status: Successful
RCA Summary: Array vaulted due to one defective LCC constantly resetting and holding the peer LCC from
updating our ESES information. This delay in updating caused the RAID/DAE availability check to fail and
triggered a NTV situation. Issue caused by CDES spending too much time trying to communicate with a power
cycling LCC so it never responded to SES page requests leaving the ESES DB in a state < 6 on the good LCC
for too long.

Cause: FW, CDES.

Must Have Fix: OPT 555152 and JIRAFICS-14197


Enhancements: N/A

KB: KB 531191 has been drafted by CS, and is in the process of been published.

Exposure: All CDES 15.50 and below

=== Problem ===


See the summary. . .

=== Root Cause Analysis ===

The issue started with a HW issue on LCC B of DAE 6 in SB-1. The LCC was experiencing constant resets at a
frequency of about 20 seconds, but started off at a lower frequency. The LCC failed and the part was shut
down soon after. As the LCC recovered we reenabled the port but the board would fail multiple times again.
Each time the LCC would fail, We have to rebuild our ESES info/data structure due to a generation code
change (as signaled by the FW that it has to rebuild the configuration information internally due to some
change in the DAE) so the ESES state of our data structure is now < 10 (ES_READY) due to the rebuild. At
around 14:45:29 SP time we went to retrieve the configuration information from CDES on LCC A, where we have
to send an ESES request ( a SCSI 1C command to read the SES pages) but we got request timeouts (BE38.99) on
our ESES requests (they took long than 3 seconds) as the FW on LCC A was going through the process of
2019-03-27 attempting to get information from the failing LCC B. LCC B was sending ? back off? requests to LCC A as it
05:36 was in the process of booting so the ?good? LCC was giving the failing LCC B more time before it replied to
Gary Ruby our SES request.

The DS has a low priority task that checks the vault triggers ( RAID Availability, DAE availability, power
zones etc.) every 5 seconds and if our eses_state is NOT 10 on both ports to the DAE we will pull the
trigger on the next occurrence of the LP task (sets the override timer for 5 seconds and gives us a small
bit of time to get out of the vault condition). In this instance, as the good LCC A was in the process of
rebuilding its ESES structure due to the generation code change, its eses_state < 6 as it was in the middle
of been rebuild and it took over 5 seconds to do so. When the LP task was run, our state on both ports 0xD
were in an initialization state as the board was booting, and our good LCC had not rebuilt its tables yet
as the CDES FW was spending far too long in with requests to LCC B. As a result of the eses_state on both
peer ports not been at the required state, the LP task set a NTV ( need to vault) trigger and the box
triggered a vault as it found that DAE 6 was unavailable.

After extensive testing New CDES revision is 15.60 was develop to trim down the time spent by CDES so it
should respond to the re-init sequence much quicker, and within our 5 second window.

=== Problem Introduced ===


Inherent in all V3 systems running Viking and voyager DAEs with CDES version 15.50 and below

=== KB Article ===


See above. . .

=== Must have Fix ===


See above. . .

=== Enhancements ===


See above . . . .

=== Recreation Efforts ===

? An augmented LCC was inserted to a test frame to inject I2C errors and IO expander resets on one
LCC to mimic the constant resetting and TWI errors.
? We could get the LCC to reset at set frequency by injecting a reset command over the serial port on
the LCC. At FW 15.17 we could reliably get the box to vault at even a 40s reset cycle
? We eliminated the TWI injection as well, and narrowed the problem to just the resets.
Thougker:bim:SURGEFieldMarker:
Y1rNͧ
Uq߶U ߶UXq߶Uw߶U"߶U1Y߶U UimY1Uvw߶Uvw߶Uvw߶UXw߶U`$1YU11Uim11 BI11'߶U
11(gw߶Uc߶U11!߶UU11$߶UU11hq߶UHpw߶UY1Uyw߶Uyw߶Uyw߶UWw߶U`$YY[w
UPq߶Up߶Uyw߶U{w߶Uww߶U"߶UAY@zw߶UssLevel_MDT:SURGEValueMarker:None 1Apcw߶UY1Uzw߶Uzw߶Uzw߶UVw߶U`$AY}w߶U
12 02:01:14: bim : 10.84.184.236: R1AU11#߶UUY1M31Uq߶U߶U|w߶Up"߶U1Yh`w߶U/
߶UY1Uq߶Uq߶U|w߶Uw߶U`$YYIOUq߶U߶U{w߶U"߶UYYUq߶Uq߶U]w߶U5߶U`$AYHq߶U12 0
2019-03-25
08:47
prefinal sent in, updating for SLO
Gary Ruby

2019-03-21
pre-final was already sent in. dropping severity to stop SLO warnings.
07:51
Severity For Version 5977.1131.1131 Changed from Critical/High to Medium.
Gary Ruby

2019-03-19
08:18
pre-final mostly done, finishing it off
Gary Ruby
✉ SURGE Support
https://surge.isus.emc.com/surge/ViewOPT.php?OPTNum=552586+ 2/7
11/12/21, 3:03 PM 552586

2019-03-15
08:01
still in pre-final, had to deal with other stuff all week.
Gary Ruby

2019-03-13
************ Surge Request ************
12:41
KB: 531191 is in draft for this issue, will write it up and publish tomorrow.
Martin Hayes

2019-03-11
08:02
in pre-fianl
Gary Ruby

2019-03-08 ************ Surge Response ************


08:07 The status of this OPT was changed from RETURNED to ASSIGNED.
Martin Hayes ucode OPT: 555152 now open for the new CDES FW.

2019-03-08
08:06
555152 with relationship duplicate is added to the Related Problems List.
Martin Hayes

2019-03-07 Can we have ucode OPT opened up for new CDES firmware to fix this issue?
Problem keys added:
14:20
Returned_Copy_OPT
Timothy Gaspar Status For Version 5977.1131.1131 Changed from Assigned to Returned.

2019-03-07
08:05
Discussion with dev on new FW and possible release to happen later today US time.
Gary Ruby

Update on weekend run.

No issues seen in Vmax on both boxes below with 15.50a running over the weekend. Will schedule a meeting
for this week to discuss next step.

2019-03-05
1. Modified LCC recreation running with 15.50a
09:34
? Injected every 20 seconds. Will run over the weekend and stop all host traffic if Vmax gets 3
Gary Ruby seconds ESES timeout or our re-init sequence takes > 5 seconds after receiving gen code change on expander.
2. Customer LCC in the Sybil room different box.
? All LCC?s in this box are at 15.50a including customer LCC . Will run over the weekend with temp
at 100F.
? Same like recreation 1 will stop host traffic on eses timeout or re-init eses sequence > 5 seconds.

all testing looks good on this new modified FW.


2019-03-04
!!!!INTERAL ONLY!!!!!
08:03
new CDES FW 15.50a seems to be holding well thus far, we got to discuss about possible release, or see
Gary Ruby where we go from here with it.

Still working with CDES Engineering to fix our issue. After many debug firmware?s over
the past month this week I was given 15.50a that ran for 24 hours with no failures. Recreation was
injection power off to my modified LCC every 20 seconds. Past recreation I could recreate our Vmax issue
within the hour. This is good news and we will continue to try to recreate and work with CDES Engineering.
The next step is to change the injection time to every 10 seconds.

15.50a Recreation:
? Inject Power off every 20 seconds. Ran for 24 hours no issues.
o Serial logs attached to Jira during this run.
? I verified the Vmax logs that we did NOT see any Vmax ESES timeouts (BE38.99) of 3 seconds for one
command and we didn?t see our ucode re-init eses sequence take > 5 seconds after gen code change (063f) was
received from one expander on the good LCC.
2019-02-28 o Our re-init esse sequence sends multipole ESES request to expander that just reported the gen code
10:15 change.
Gary Ruby ? Customer issue and previous recreation I would get the 3 second VMAX timeout for one ESES command
and our re-init sequence would take > 7 seconds.
? With 15.50a I see our re-init sequence for one expander take less than 2 seconds for all ESES
request.

Customer (ING) LCC Update:


? Customer LCC still Running in Sybil room but we cannot get the reboot time to 20 seconds. The
best we can do from the logs is every 30 seconds. LCC only reboots when we heat up DAE/Room. Current
temp in room is 100 F.
? Only able to recreate the customer issue 1 time in-house.
? If we have firmware that we believe fixes the issue we will load it on this LCC before sending it
for Hardware FA

2019-02-28
08:03 overnight testing from Tuesday, seemed fine as we checked it yesterday, awaiting last nights results now as
Gary Ruby well.

2019-02-27
08:29 Still working with CDES Eng. We are getting debug firmware from CDES every other day and recreation our
Timothy Gaspar issue with the modified LCC. CDES is working to understand the problem and develop a firmware fix.

2019-02-27
08:03 running tests on CDES FW 15.50f look to help significantly with rolling reboots and timing out on the good
Gary Ruby LCC.

2019-02-25
06:51
case is still in recreation for CDES.
Gary Ruby

✉ SURGE Support
https://surge.isus.emc.com/surge/ViewOPT.php?OPTNum=552586+ 3/7
11/12/21, 3:03 PM 552586

2019-02-22
06:17
new debug FW is still been tested
Gary Ruby

2019-02-20 Waiting on new firmware from CDES to fix issue. Last firmware didn't fix issue. Customer LCC still
12:52 running in temp. room at 100 degrees and we see same behavior we did see at customer site. Customer LCC is
Timothy Gaspar rebooting when temperature in DAE is increased.

2019-02-20
07:43
ongoing CDES investigations.
Gary Ruby

2019-02-18
08:16
updating cause of SLO, still going back and forward with CDES in the background.
Gary Ruby

Still working with CDES Eng. We have loaded multiple debug firmware to find root cause. Ongoing
2019-02-14
08:02 The Customer LCC does show a rolling reboot like our modified LCC when it arrived at 176. Customer LCC is
Timothy Gaspar running in the Sybil room with temp at 100 degrees trying to recreate issue.

2019-02-11
08:01
ongoing testing with CDES debug fix to nail the problem down with the delays in the thread processing.
Gary Ruby

(INTERNAL ONLY!!!!)

from CDES:
"I have attached 15.50e. It contains a potential fix for the issue where EMA thread constantly retries,
2019-02-06 even when p2p send is failing with the 0x1201e (BACKOFF) error. Please give it try. I would like to see if
09:29 preventing the EMA thread from constantly retrying alone alleviates your issue or if we need to also
Gary Ruby prevent the ThreadMgr thread from constantly retrying.
[^CDES_Bundled_FW_15_50e.bin]"

seems they are working on a possible issue, but since we STILL don't have the customer LCC< not sure this
is something new, or just another hidden issue.

from CDES's debugging:

"I don't see the signature where ses_peer_new_config() takes a long time which is what I expected with my
"fix". So basically my previous debug build showed that both ThreadMgr thread and EMA thread were
2019-02-05 excessively retrying P2P transactions on 0x1201e. My "fix" in 15.50d tried to eliminate the EMA thread
retries which were slowing down ses_peer_new_config(). Although I don't see anything obvious in the serial
08:09
log yet, it is possible that ThreadMgr thread retrying excessively is also slowing down SES responses.
Gary Ruby Could you please kick off another run?"

it would seem that the slowdown in response to our SES requests to update our ESES pages might be coming
from the ThreadMgr thread of CDES. we are continuing to attempt recreations here. it still looks like CDES
was hung here either way.

2019-02-01
still recreating with CDES, it seems we are spending a lot of time in the per ses_peer_new_config()
08:05
function in CDES, which would be CDES trying to ping the peer LCC when it finds it has a new config page.
Gary Ruby this could have lead to the time outs on the "good" LCC here. still investigating .
2019-01-30
13:00 Still working on recreation with modified LCC. Customer LCC should be arriving soon and we will put it in
Timothy Gaspar my system when it does we will pull the LCC logs off to have CDES eng review them

!!!!!!!!!!!
note: Eng and L2 CS information only, misrepresenting the good and the bad here will lead to confusion,
don't take this as something you can go back to folks with.
!!!!!!!!!!!

2019-01-28 recreation ongoing, we can reliably recreate the issue in house using a doctored LCC to kill power even at
15.50 when we trigger every 20 secs, but we are engaged with CDES eng as to why. the problem doesn't
08:44
recreate at 15.50 at 60 sec injections.
Gary Ruby
CDES asked for a few more tests to be run and we are drilling down to what's going on with CDES during
these vault situations to see what if anything can be done.

with the box not recreating at 15.50 on an overnight test with the 60 sec injection, I think 15.50 could
still provide a lot of relief to most customers, but it seems we were going every 20 secs in the customer
case, so we still need to proceed here.

Update:
2019-01-25 Working with CDES Eng.
10:26 Still Working on recreation with modified LCC.
Timothy Gaspar Waiting on FA for the LCC.
Reviewing the customer logs and CDES logs.

2019-01-25
08:15
working on my timeline again.
Gary Ruby

2019-01-22
03:52
working an on going escalation opt yesterday
Gary Ruby

✉ SURGE Support
https://surge.isus.emc.com/surge/ViewOPT.php?OPTNum=552586+ 4/7
11/12/21, 3:03 PM 552586

2019-01-18
got another case to review today, but made a bit more head way on my own timeline for a write up. I have to
06:04
pour over the logs to inset all the relevant moments and filter out some of the "noise" that came with the
Gary Ruby incident.
2019-01-16
08:02
meeting with CDES Engineering later. still working on the timeline
Gary Ruby

2019-01-14
working on setting up a detailed timeline for a write up at a later date and set the events up for myself.
11:26
we are still going back and forth with CDES eng on a few questions. will get all the bits together first
Gary Ruby and then start a write up at some point in the week hopefully.
2019-01-11
10:14 following up with CDES on a timed out SCSI command. some good idea came form our interlock with Dev we want
Gary Ruby to follow up on to try and drill down.

2019-01-10
14:55
Still reviewing the logs and ucode. Like Gary stated above we see i2c errors in cdes logs.
Timothy Gaspar

2019-01-09 I'm seeing some I"C error at around the time our box dropped form the ATRC/ELOG, would like to see the logs
off the LCC returned as well.
12:01
Gary Ruby we also have open questions out to CDES eng as well. we have a fairly good understanding of what happened
here, but we still; need to dig through some of the other investigations here before we can call it a day.

************ Surge Request ************

SURGE Upload

******** Summary ********


Submitter: martin.hayes@emc.com
EE Owner: Gary Ruby
Site:
SR #:
2019-01-09
Upload details:
06:20
Active Log & Event Log for DAE-6 (ESES-8,9,a,b,c) for LCC-A (from port-0x11)
Martin Hayes

******** Files ********


Original Location: Local Upload

******** Misc. Files ********


Event Log dir2 portx11 DAE6.LOG
Active Trace dir2 port0x11 DAE6.LOG

************ Surge Request ************

SURGE Upload

******** Summary ********


Submitter: martin.hayes@emc.com
EE Owner: Gary Ruby
Site:
SR #:
2019-01-09
Upload details:
05:38
Active Log & Event Log from DAE-6 (ESES-8,9,a,b,c) in SB-1 from port-D (LCC-B)
Martin Hayes

******** Files ********


Original Location: Local Upload

******** Misc. Files ********


Active Trace dir2 portD DAE6.LOG
Event Log dir2 portD DAE6.LOG

2019-01-09
03:55 EQA Engineer changed from BLANK to Kevin Crowley
Kevin Crowley

✉ SURGE Support
https://surge.isus.emc.com/surge/ViewOPT.php?OPTNum=552586+ 5/7
11/12/21, 3:03 PM 552586

Gary updated Surge 552536 instead of this one. His update below.

01/08/2019 11:19:48 AM Gary Ruby


for a brief overview ( or as brief as I can fro now), the issue seems to start at around 14:35:57 array
time where we get some DMA transfer errors on commands to LCC B DAE 6( port 0xD) which affects both DS, and
eventually this drops with a 700B.C7 ESES ready timeout error. this leaves port 0xD down on both DS 1c and
2c.

at about 14:47:28 we get a generation code change on ESES 8 on port 0x17 (LCC A DAE 5)and we have to reset
our ESES DB and relearn the environment. the ESES gets through to state 5, which is
2019-01-08 ESES_STATE_INIT_SEQUENCE, but this gets stuck in this initialization stage for over 5 seconds which
17:30 triggers the vault.
Timothy Gaspar
waiting on ATRC and ELOG info to verify any issues in the LCC A for DAE 5 which is just prior on the chain
to DAE 6 where we had a bad LCC. this blocked access to DAE 6 from both sides which triggered the vault.

additionally, due to not getting the 08F2.xx lifesigns information, we won't be able to get any information
on why DS 2c dropped DD. the info is flushed from the buffers now so its a dead investigation before it
starts.

01/08/2019 11:25:03 AM Gary Ruby


seems I got my box layout wrong, we have a Viking ( eses 0-4)-> voyager (eses 5-7) -> Viking (eses 8-12)
again setup so ESES 8 on port 0x121 (17) IS the root expander of DAE 6. this fits now, so we weren't
stopped in the DAE in front of our DAE 6 on the daisy chain.

2019-01-08 updating problem keys and reviewing supplied logs. I'll be the Eng owner for this case.
Problem keys added:
09:22
DUDL
Gary Ruby Customer RCA

Just adding in email sent last night.

Martin,

Can you open Surge and assign it to Gary Ruby and collect the additional logs below?

o Additional Logs: (please attach to surge)


2019-01-08 ? Syslog
09:14 ? Use KB to collect all the syslog?s with tftp
Timothy Gaspar ? One month of symptoms.log
? Please verify the symptoms.log has the first vault event.
? One Week of logall?s.
? ARTC/ELOG from all LCC?s
? E5 on the 90FF and D80B.

Thank you
Tim

************ Surge Request ************

SURGE Upload

******** Summary ********


Submitter: martin.hayes@emc.com
EE Owner: Gary Ruby
Site:
SR #:
2019-01-08
05:38 Upload details:
Martin Hayes 6 month symptoms file

******** Files ********


Original Location: Local Upload

******** Misc. Files ********


symptoms.zip

************ Surge Request ************

SURGE Upload

******** Summary ********


Submitter: martin.hayes@emc.com
EE Owner: Gary Ruby
Site:
SR #:
2019-01-08
05:35 Upload details:
Martin Hayes syslogs for 1C & 2C

******** Files ********


Original Location: Local Upload

******** Misc. Files ********


syslogs.zip

✉ SURGE Support
https://surge.isus.emc.com/surge/ViewOPT.php?OPTNum=552586+ 6/7
11/12/21, 3:03 PM 552586

************ Surge Request ************

SURGE Upload

******** Summary ********


Submitter: martin.hayes@emc.com
EE Owner: Gary Ruby
Site:
SR #:
2019-01-08
Upload details:
05:33
Logalls
Martin Hayes

******** Files ********


Original Location: Local Upload

******** Misc. Files ********


296700054_date190106_time201934.zip
296700054_date190107_time151003.zip

2019-01-08
05:20
The status of this OPT has been changed to Assigned.
Martin Hayes

SURGE VMAX_UCODE SSCO_DLM/Drives Escalation

******** Summary ********


Submitter: martin.hayes@emc.com
Site: ING BANK UMRANIYE
SR #: 13170581
VMAX Version: 5977.1131.1131
Registered Build:
MicroCode: 0041
Serial Number: CK296700054

Where Found: Customer Site


Escalation Reason: Catastrophic Event
SFDC Number: 522724
Escalate per SFA: No
Immediate Response Required: Yes
Engineer Assigned: Gary Ruby (68867)
CS recovery engineer email: martin.hayes@dell.com
Severity: 1 - Critical/High

******** VMAX_UCODE SSCO_DLM/Drives Escalation Details ********


2019-01-08
Issue Summary:
05:20
Array vaulted due to no paths available into a DAE
Martin Hayes
Issue Description:
LCC-B failed and after 12 mins we lost all paths into DAE-6, triggering a NTV.
The array went offline to customer. It recovered but vaulted a futher 8 times.

Troubleshooting Steps Performed:


CE was sent to site to replace the affected LCC and stabalize the box.
Customer RCA is open for this event, looks similar to issues at a previous customer (surge OPT: 548344),
ref KB: 522724.

Logfile review is ongoing and will work with eng GR on this.

Current Impact:
The array vaulted 9 times, once the LCC was remvoed from the array it stabalized.

Expectations of Engineering:
Work with TS2 once a review is sent in.

Justification why immediate response is required:


Hot RCA

This page was created in 0.49482679367065 seconds

✉ SURGE Support
https://surge.isus.emc.com/surge/ViewOPT.php?OPTNum=552586+ 7/7

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy