Aquilion8 - will not boot
Aquilion8 - will not boot
Phenomenon
After turning ON the power, the S-con won't complete the start-up.
• RTM
• RTM-D
• MHR-FC in slots 19 and 18
• S-con PC
• FC card in S-con
• Recon Box back plane/chassis
Checks performed:
• Power supplies confirmed
• Running with minimum configuration: MHR-FC (slot 19), MHR-FC (slot 18), MHR-DP (slot 17)
• Connected fiber cable direct from S-con PC FC card to MHR-FC in slot18
The RTM boot log always indicated the same thing (after each intervention). Here is a copy of the
boot log in blue.
3
2¢É£¹J§©µ°”¿Íèj¿–º=³00°©!±«É¥¤¹©©!±«É¥¤¹©©”2£¹J'¤¹©©
0x1fffdf8 (tRootTask): miiPhyInit check cable connection
Attached TCP/IP interface to dc unit 0
Attaching interface lo0...done
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
]]]]]]]]]]] ]]]] ]]]]]]]]]] ]] ]]]] (R)
] ]]]]]]]]] ]]]]]] ]]]]]]]] ]] ]]]]
]] ]]]]]]] ]]]]]]]] ]]]]]] ] ]] ]]]]
]]] ]]]]] ] ]]] ] ]]]] ]]] ]]]]]]]]] ]]]] ]] ]]]] ]] ]]]]]
]]]] ]]] ]] ] ]]] ]] ]]]]] ]]]]]] ]] ]]]]]]] ]]]] ]] ]]]]
]]]]] ] ]]]] ]]]]] ]]]]]]]] ]]]] ]] ]]]] ]]]]]]] ]]]]
]]]]]] ]]]]] ]]]]]] ] ]]]]] ]]]] ]] ]]]] ]]]]]]]] ]]]]
]]]]]]] ]]]]] ] ]]]]]] ] ]]] ]]]] ]] ]]]] ]]]] ]]]] ]]]]
]]]]]]]] ]]]]] ]]] ]]]]]]] ] ]]]]]]] ]]]] ]]]] ]]]] ]]]]]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
]]]]]]]]]]]]]]]]]]]]]]]]]]]]] Development System
]]]]]]]]]]]]]]]]]]]]]]]]]]]]
]]]]]]]]]]]]]]]]]]]]]]]]]]] VxWorks version 5.4.2
]]]]]]]]]]]]]]]]]]]]]]]]]] KERNEL: WIND version 2.5
]]]]]]]]]]]]]]]]]]]]]]]]] Copyright Wind River Systems, Inc., 1984-1999
2
1
0
Starting...
Found 6 G4 processors(s):
ISPHost.c 1265 tISPHost | WARNING, Got SGI Loop ID 0x00, expected 0x40
ISPHost.c 1274 tISPHost | errno:0, 0x40040005: <<MISSING devices on Host Fibre Channel>>
ISPHost.c 1277 tISPHost | errno:0, 0x40040005: <<Expected 3 devices, Only 6 found >>
ISPHost.c 1290 tISPHost | 6 devices found: 0x78 0x79 0x7a 0x7b 0x7c 0x41
ISPHost.c 1343 tISPHost | Sent ISP ready message to RDD.
ISPHost.c 1037 tISPHost | Some port(s) changed ...
ISPHost.c 1265 tISPHost | WARNING, Got SGI Loop ID 0x00, expected 0x40
ISPHost.c 1274 tISPHost | errno:0, 0x40040005: <<MISSING devices on Host Fibre Channel>>
ISPHost.c 1277 tISPHost | errno:0, 0x40040005: <<Expected 3 devices, Only 6 found >>
ISPHost.c 1290 tISPHost | 6 devices found: 0x78 0x79 0x7a 0x7b 0x7c 0x41
0xfbaa78 (tNetTask): miiPhyInit check cable connection
The following lines of the log are not correct and were of concern:
PCI VxWorks Board Bridge Serial
CPU Slot Node Size Revision Type Type Number
--- ---- ---- ---- -------- ------- ------ --------
1 19 A 64 6 G4-DTB INTEL 13694y16
2 19 B 128 6 G4-DTB INTEL 13694y16
3 18 A 4 6 G4-DTB INTEL 08565928
4 18 B 4 6 G4-DTB INTEL 08565928
Apparently the wrong slot was getting configured by the RTM. This is also why the RTM was
replaced again as a test. Communication from the RTM to the other boards is via the back plane,
hence a spare back plane was also obtained for testing purposes.
And:
ISPHost.c 1265 tISPHost | WARNING, Got SGI Loop ID 0x00, expected 0x40
ISPHost.c 1274 tISPHost | errno:0, 0x40040005: <<MISSING devices on Host Fibre Channel>>
ISPHost.c 1277 tISPHost | errno:0, 0x40040005: <<Expected 3 devices, Only 6 found >>
ISPHost.c 1290 tISPHost | 6 devices found: 0x78 0x79 0x7a 0x7b 0x7c 0x41
0xfbaa78 (tNetTask): miiPhyInit check cable connection
Q-Logic test (performed by Ctrl-Q at system start-up): The system found and recognized the FC
card in the S-con PC. It also passed the loop back test from the S-con PC FC cCard to the Rec
Box MHR-FC.
This result confirms that the hardware connection between the S-con and Rec box is good.
It was decided to resort to parts replacement. Using the parts available on-site, the following parts
were replaced one at a time and returned to original when there was no change:
Based on the negative results the suspicion was raised that there could be more that one failed
part at this time. It was decided to start leaving in the new parts and look for any changes in the
boot log.
With the new back plane/chassis in the system and a new RTM, the system booted with the correct
chassis configuration and started configuring the RTM. After a few system self-resets, it booted
passed the point where it had stopped before (many times..) and continued all the way.
The system did not come up correctly, however this was because of all the self-resets, causing it to
not be in sync with the system software boot sequence. So, after a full system reboot, it came all
the way up and it was possible to do warm-up. A quick test also confirmed that scanning worked
fine.
The system software was reconfigured and the RDD data was deleted by doing a RDD config with
parity. This was done to ensure that there was a clean and fresh RDD config file. System
restoration was completed and the system was re-calibrated.
The new parts that were left in the system are: S-con PC, RTM, RTM-D, and back plane/chassis.
The S-con PC had actually already been replaced to tackle an earlier issue with a network /
motherboard problem. The other 3 parts were the cause of the system not booting.
Solution
Replacement of the back plane/cPCI chassis, the RTM and the RTM-D.
Notes
Knowing what information to expect in logs or boot logs is only possibly by keeping logs and boot
logs of normal system behaviour available for reference and comparison in the events of abnormal
system behaviour. Such normal logs, for reference purposes, anticipating abnormal system
behaviour, can be kept available in the S: partition on the system (S-con)(recommended), or in a
folder on the service PC of the site-responsible service engineer.