ZXA10 C6XX V1.1.0 Troubleshooting Guide V1.1
ZXA10 C6XX V1.1.0 Troubleshooting Guide V1.1
ZXA10 C6XX V1.1.0 Troubleshooting Guide V1.1
http://tsm.zte.com.cn/tsm/FileCenter/File.aspx?Mode=read&FileID=30662900
Revision History
Document
Product Version Serial Number Reason for Revision
Version
First published
Modules added
Author
Reviewed
Document Version Date Prepared by Approved by
by
1.1
Follow-up document: After read this document, you may need the
following information
SEQ Reference Material Information
Abstract
Chapter Description
错误!未找到引用源。 错误!未找到引用
源。
错误!未找到引用源。 错误!未找到引用
源。
错误!未找到引用源。 错误!未找到引用
源。
错误!未找到引用源。 错误!未找到引用
源。
错误!未找到引用源。 错误!未找到引用
源。
Contents
Overview .......................................................................................................................................... 7
Troubleshooting methods when SNMP management is blocked ........................................... 8
1. Confirm EMS matching version ....................................................................................... 8
2. Check SNMP-related configurations ............................................................................... 8
(1)Check whether the version that the NE snmp corresponds to is enabled ........ 8
(2)Check SNMP community names configured on EMS and NE ........................... 8
(3)Check whether view on EMS is consistent with that defined .............................. 9
3. Check network states ........................................................................................................ 9
(1)Check whether the route entry is set up ................................................................ 9
(2)Check whether it is connective with the EMS server’s address ......................... 9
(3) Perform snmp ping test on EMS .................................................................... 10
4. Check states of SNMP packets receiving / transmission statistics .......................... 10
5. Enable SNMP log to observe ......................................................................................... 12
6. View statistics of the main control’s control-panel SNMP protocol........................... 12
7. View the value of control-panel SNMP rate limit ......................................................... 13
8. View CPU queue rate limit configuration ...................................................................... 13
Troubleshoot when the whole C600 NE is disconnected ....................................................... 14
1. Symptoms and features of such disconnection problems ......................................... 14
2. Collect information for such disconnection faults ........................................................ 14
(1)Collect information of card, version, patch and cpu usage ratio and so on .... 14
(2)Collect information of all alarms and logs on the NE ......................................... 19
(3)All alarms on EMS ................................................................................................... 21
(4)Suspension File Explanation ................................................................................. 21
(5)Query suspension files ........................................................................................... 22
(6)Remotely view suspension files ............................................................................ 23
(7)Upload suspension files .......................................................................................... 24
(8)Collect CLI and diagnosis mode log file ............................................................... 24
(9)Collect information under OAM diag shell mode ................................................ 26
Troubleshooting when C600 line cards are HWONLINE ....................................................... 27
1.Features and corresponding state to light on the RUN indicator ............................... 27
2. Collect fault information when the card is HWONLINE .............................................. 28
3. The card becomes from INSERVICE into HWONLINE state .................................... 28
(1)Inter-card communication is interrupted............................................................... 28
(2)The card is repeatedly restarted............................................................................ 32
(3)Abnormal card tasks ............................................................................................... 34
(4)System control process or service support process is faulty ............................ 34
C600 NE Service Troubleshooting Guide ................................................................................. 35
1. Introduction of C600 NE service channel model ......................................................... 35
2. Unicast Service Troubleshooting Guide ....................................................................... 36
Overview
For newly-launched C6xx series products, ZXA10 C6xxV1.1.0 is the main version.
maintenance methods and mechanism at the PON side,there are great differences in
maintenance and troubleshooting are a bit different. This manual is written to let
after-sales maintenance personnel be familiar with the equipment and provide on-site
blocked
First, make sure the EMS version matches with the NE version. The EMS version
must be T125, T135 patched and newer versions, otherwise, C600 NEs may be unable to
be managed normally on EMS, or part of functions may unable to be used. For details,
Through executing the command show snmp config on the NE, show all snmp
enabled
First, you need to confirm whether the version that snmp-server corresponds to is
Check whether community names configured between EMS and NE are consistent.
Because the community name on the NE is encrypted, if you forget the community
name on the NE, you can ensure consistent community names between EMS and NE
View whether the route entry that the NE’s management address corresponds to is
Through PING the IP address of the EMS server on the NE, view whether it can be
pinged successfully. Or ping the NE fro the EMS server, to view whether it is connective. If
OLT NEs are connected via out-band interface, and ping EMS server from the NE,
interface name needs to be added. To PING the IP address of EMS through the NE’s
If OLTs are connected via in-band management address, directly ping the IP address.
Through snmp ping test on the EMS interface,check interconnection states of the
snmp protocol between EMS and NE. You can set EMS snmp ping 100 packets, timeout
statistics
Use the command show snmp to check states of snmp packets received and
transmitted by the NE, generally, each increase of request packets got or set must
correspond to response packets one to one. Mainly focus on following blue parts:
ZXAN(config)#show snmp
27571 SNMP packets input
0 Bad SNMP version errors
30 Unknown community name
0 Illegal operation for community name supplied
76245 Number of requested variables
300 Number of altered variables
17060 Get-request PDUs
10392 Get-next PDUs
89 Set-request PDUs
570806 SNMP packets output
0 Too big errors (Maximum packet size 8192)
1 No such name errors
0 Bad values errors
257 General errors
27524 Response PDUs
543282 Trap PDUs SNMP
0 Input ASN parse errors packets
0 Proxy drops packets
0 Unknown security model packets
0 Unknown PDU handler packets
0 Unsupported security level packets
0 Not in time-window packets
0 Unknown user name packets
0 Unknown engine ID packets
0 Wrong digest packets
0 Decryption error packets
SNMP version v1: enable
SNMP version v2c: enable
To observe whether the NE receives snmp packets or not, snmp log can be enabled,
and SNMP log information within the disconnected time can be got.
ZXAN(config)#snmp-server log enable all
ZXAN(config)#snmp-server log enable get
ZXAN(config)#snmp-server log enable set
protocol
Check whether the CPU queue rate limit is too small, which can be troubleshooted
If the problem still fails to be solved through above troubleshooting methods,you need to
capture packets to analyze. First, Capture packets at the EMS side, if EMS side transmits
packets, but receives no packet replied from the NE side or packet replying at the NE side
is abnormal, you need to mirror and capture packets at the NE side to analyze, so as to
confirm where on earth the packets are lost or why packet replying is abnormal.
disconnected
Disconnection happens between EMS and NE, if EMS has received alarm, and then
suddenly the NE is link-down, failing to ping the NE from the EMS server, and failing to
connect remotely.
(1)Collect information of card, version, patch and cpu usage ratio and so
on
ZXAN#show equipment
Slot CfgType RealType HardVer McuVer BootVer EpldVer FpgaVer M-Code SN
--------------------------------------------------------------------------------
4 GFGH GFGHK V1.0.0 V1.0.3 V1.0.12 V1.7 N/A 000100 sn0004_GFGHK
6 XFTO XFTOA V1.0.0 V1.0.3 V1.0.12 V1.3 N/A 000100 100000000000
8 XFPH XFPHR V1.0.0 V1.0.3 V1.0.12 V1.7 N/A 000100 sn0000_XFPHR
10 SFUL SFUL V1.0.0 N/A V1.0.12 V2.2 V1.0.1 000100 sn0010__SFUL
11 SFUL SFUL V1.0.0 N/A V1.0.12 V2.2 V1.0.1 000100 sn0011__SFUL
ZXAN#show install
Fpga-1.1.0-70820:
Package: Fpga
Version: 1.1.0
Install from: Fpga_V1.1.0.45332.pkg, size 81,563,583 Bytes, build on 2018-08-08 03:29:55
Boot-1.1.0:
Package: Boot
Epld-1.1.0:
Package: Epld
Version: 1.1.0
Install from: Epld_V1.1.0.45332.pkg, size 10,455,918 Bytes, build on 2018-08-08 03:27:24
Mcu-1.1.0:
Package: Mcu
Version: 1.1.0
Install from: Mcu_V1.1.0.44.pkg, size 18,770,517 Bytes, build on 2018-08-04 01:43:29
Base-1.1.0-1:
Package: Base
Version: V1.1.0
Install from: base.set, size 895,066,397 Bytes, build on 2018-08-13 10:15:59
Epld-1.1.0:
Package: Epld
Version: 1.1.0
Mcu-1.1.0:
Package: Mcu
Version: 1.1.0
Base-1.1.0-1:
Package: Base
Version: V1.1.0
Fpga-1.1.0-1238:
Package: Fpga
Version: 1.1.0
ZXAN#
Read CPU usage ratio information:
If the cpu usage ratio of the main control or a certain line card is found very high, directly
collect information of the card.
ZXAN#show processor
ZXAN#
ZXAN#show temperature detail
-------------------------------------------------------------------------
BoardType : Type of board
I2C : Inter integrated circuit
Addr : Address of check point
Description: Temperature point information
Status : Status of check point
Minor : Slight alarm value (celsius)
Major : Serious alarm value (celsius)
Fatal : Fatal alarm value (celsius)
Overheat : The high-threshold of temperature point (celsius)
Temper : Current temperature (celsius)
-------------------------------------------------------------------------
[shelf 1, slot 4]
BoardType I2C Addr Description Status Minor Major Fatal Overheat Temper
-------------------------------------------------------------------------
GFGH 0 1 cpu Normal 100 108 115 120 40
GFGH 0 2 np temp Normal 100 108 115 120 54
[shelf 1, slot 6]
BoardType I2C Addr Description Status Minor Major Fatal Overheat Temper
-------------------------------------------------------------------------
Cmdlog (there may be a lot of prints, for the NEs with especially long running time,
recommend to collect cmdlog log file)
ZXAN#show logging buffer cmdlog
Log acceptance: on
Log matched: 301
Log in buffer: 301
Buffer occupied: 32.38%
Log file path: /sysdisk0/usrcmd_log/
An alarm 400315 ID 2307 level 3 occurred at 12:49:17 08-19-2018 sent by ZXAN MPU-1/10/0
%POWER% Power voltage fault! Shelf = 1, Group = 0, Slot = 20 DC Power voltage fault alarm
History alarms:
ZXAN#show alarm history
An alarm 722452 ID 5724 level 3 occurred at 13:57:32 08-20-2018, cleared at 13:57:35 08-20-2018 sent
by ZXAN PFU-1/4/0
%GPON% GPON alarm link olt rdii (rack 1 shelf 1 slot 4 port 1 onu 2)
Collect Snmp log:
Some NEs disable snmp log by default, at the moment, there is no snmp log information,
you need to manually use the snmp-server log enable all command under the
configuration mode to enable it.
ZXAN#show logging buffer snmp log
Log acceptance: on
Log matched: 1
Log in buffer: 1
Buffer occupied: 0.14%
Log file path: /datadisk0/LOG/SNMP/
All the alarms and operation logs before / after the NE is faulty can be collected on
EMS.
Collect suspension files related to the main control card and abnormal slots on the
master / slave servers (/datadisk0/run_log directory, /datadisk0/run_log/EXCINFO
directory and /datadisk0/run_log/CPULOAD directory).
Detailed collection is as shown in the following:
/datadisk0/run_log/Exc_Omp.txt------General suspension files of the main control card
/datadisk0/run_log/Exc_pp.txt------ General suspension files of line cards
/datadisk0/run_log/EXCINFO/Exc_0_1_slot_0.txt--slot logs of a card, slot represents slot number, to
record important logs in the running of the card
/datadisk0/run_log/EXCINFO/exception_kernel_0_1_slot_0.txt--Kernel suspension files of a card, slot
represents slot number, to record the card’s kernel suspension logs and call general resetting interface to
reset logs.
ZXAN#dir /datadisk0/run_log
Directory of MPU-1/10/0: /datadisk0/run_log
30644060 KB total (23669384 KB free)
ZXAN#dir /datadisk0/run_log/CPULOAD
Directory of MPU-1/10/0: /datadisk0/run_log/CPULOAD
30644060 KB total (23669348 KB free)
ZXAN#dir /datadisk0/run_log/EXCDUMP
Directory of MPU-1/10/0: /datadisk0/run_log/EXCDUMP
30644060 KB total (23669308 KB free)
Suspension files can be directly viewed by using the more command + file name on
CLI. Capture screen and record before viewing.
As shown in the following:
More /datadisk0/run_log/Exc_Omp.txt
To type out logs at one time, please set terminal properties:
terminal length 0.
However, you need to be cautious if there are rather a lot of contents in files.
To disable, use: no terminal length
Instance:
ZXAN#more /datadisk0/run_log/Exc_Omp.txt
************************ End of Record ************************
To send the whole suspension file to the R & D Institute, the suspension file needs to
Instance:
If it is in-band, please remove the vrf mng parameter. If marked as black, represent
the specific path to upload suspension file, if marked as purple, represent the IP address
of the FTP server, if marked as green, represent the user name and password of the FTP
server.
Collect CLI and diagnosis mode log files of the main control card
Details:
represents date and index, the logs are recorded by entering command under under basic
represents date and index, the logs are recorded by entering command under special
cmdlog_20180528110623_0.cmd.log //*.*.*.*/usrcmd.log@xx:xx
If it is in-band, please remove the vrf mng parameter. If marked as black, represent
the specific path to upload suspension file, if marked as purple, represent the IP address
of the FTP server, if marked as green, represent the user name and password of the FTP
server.
Under V1.1.0 version, for operation methods to enter the shell mode via CLI, please refer
to ZXA10 C6xx V1.1.0 Operation Methods to Enter Shell under Diagnosis Mode.
Check whether there are tasks suspended: After current general engineering version
has tasks suspended, it will be restarted to switch. If not switching, under the ADM
whether there are tasks suspended, if yes, switch to corresponding process, and
ZXAN(diag-shell-MPU-1/10/0)#
ZXAN(diag-shell-MPU-1/10/0)#exec XOS_DbgTt2("SCS_MCM_MGT_2")
XOS_DbgTt2("SCS_MCM_MGT_2")
[ADM]Show task context by proc...
[ADM]Track function call list...
[ADM]0x9d045658 __recvmsg(PC)+(0x58/0xac)
[ADM]0x1050cdd0 L1WaitForMSG+(0x5c/0x10c)
[ADM]0x1050c6bc L1ScheTaskEntry+(0x270/0x928)
[ADM]0x104d4858 Vos_UniThreadEntry+(0xdc4/0xed8)
[ADM]value = 0 = 0x0(32);value = 0=0x0(64)
[ADM]
To view suspension logs of the card, execute related function under ADM process of
the corresponding card’s diag-shell, for specific function name, you can contact the R & D
to get.
indicator
The RUN indicator is divided into BOOT phase and version phase according to
different running phases of the card’s software. In BOOT phase, usually the RUN indicator
is lighted on by BSP, used for indicting to the user that the BOOT software starts to run,
which is an important prompt information for users. At present, the lighting on of the the
RUN indicator in BOOT phase for C600 cards is consistent with C300, the green LED is
Note: If the yellow LED / orange LED is solid on once powered on, generally,it is the
DDR memory’s chip that is faulty. When BOOT is running and before it changes to DDR, it
In version running phase, the controlling of RUN and the card’s system control states
LED Definition:
State of Lighted on Definition
The RUN indicator Represent that the card is in a state that can
slowly flashes in green normally provide services(INSERVICE)
(1s)
The RUN indicator Represent that the card is in the configuration
quickly flashes in green process(configing)
(0.5s)
The RUN indicator is (1)Represent that the card is in the version
solid on in green started process
(2)Represent that the card is in the HWONLINE
state
The RUN indicator Represent that the inserted card and
slowly flashes in yellow configuration type are
(1s) mismatched(TYPEMISMATCH)
1)General symptoms
On cli, use show card to view the state of the corresponding slot’s card is HWONLINE,
both NE and EMS have Board not running alarm, and the RUN indicator of the card is
solid on in green.
Execute show processor, and there is no information about the HWONLINE line card.
Under oam diag-shell, fail to enter the line card’s shell, and fail to ping the IP
2)Collect information
Collect linkup information and ping information of the main control and each card
under the FNSC_SVR process of the main control card. Functions involved:
different, please ask the R & D Institute to get operation commands when operating)
Collect the main control card’s Ethernet port information and SSP switching chip
Log in HWONLINE card in ODB remote mode or file mode to collect serial port prints
Collect suspension files, slot log, inter-card communication interruption log, CPU
3)Check linkup information and ping information of cards under the main control card
relationship.
ZXAN(diag-shell-MPU-1/10/0)#show card
Shelf Slot CfgType Port HardVer Status
---------------------------------------------------
1 1 GFGH 16 V1.0.0 INSERVICE
1 2 GFTH 16 V1.0.0 INSERVICE
1 3 XFTH 16 V1.0.0 INSERVICE
1 4 GFTH 16 V1.0.0 INSERVICE
1 10 SFUL 0 V1.0.0 INSERVICE
1 11 SFUL 0 V1.0.0 STANDBY
1 22 PRVR 0 V1.0.0 INSERVICE
1 23 FCVD 0 V1.0.0 INSERVICE
Pinging the ip address of the HWONLINE card can further confirm whether the
Through ping command, ping the IP of the target card slot from the host. Each slot’s
slot 2, and the like. For example, under the main control diag shell, enter the FNSC_SVR
process (process number is 31), execute **Ping(“168.1.132.0”,4), this function has two
parameters: the first parameter is the ip address of the corresponding HWONLINE card,
and the second parameter is the quantity of ping packets(this number must be filled in).
Format:
ZXAN(diag-shell-MPU-1/10/0)#execute **Ping("168.1.132.0",4)
diagRslPing("168.1.132.0",4)
[FNSC_SVR]PING 168.1.132.0 (168.1.132.0): 56 data bytes
64 bytes from 168.1.132.0: seq=0 ttl=64 time=0.457 ms
64 bytes from 168.1.132.0: seq=1 ttl=64 time=0.450 ms
64 bytes from 168.1.132.0: seq=2 ttl=64 time=0.420 ms
64 bytes from 168.1.132.0: seq=3 ttl=64 time=0.438 ms
If above has no showing of packets replied, it represents that pinging failed, indicating
the inter-card communication is blocked, at the moment, you need to get logs under
and further analyze logs recorded and such information as CPU Ethernet port, phy, mac,
4)Query the information of CPU Ethernet port and SSP switching chip port under oam
Log in the the main control card to enter under diag shell FNSC_SVR process,
continue entering interface to check packets received / transmitted, including packet loss
states.
communication is faulty.
5)Log in HWONLINE card in ODB remote or file mode, and collect serial port prints of the
faulty card.
Use ODB file mode to collect serial port prints of the HWONLINE card, and observe
First, use the following command to delete HWONLINE card ODB serial port logs
delete /datadisk0/run_log/ODB/odb_log_0_1_slot_0.txt
And then use the following command to enable the HWONLINE card’s ODB file mode,
For specific usage methods, please refer to ZXA10 C6xx V1.1.0 Operation Methods
Collect NE suspension file, kernel suspension file, BSP starting log, inter-card
communication interruption log, slot log and CPU usage ratio soaring log
Collect following files (all following slots represent slot numbers of the HWONLINE
faulty card)
/datadisk0/run_log/Exc_Omp.txt
/datadisk0/run_log/Exc_pp.txt
/datadisk0/run_log/EXCINFO/exception_kernel_0_1_slot_0.txt
/datadisk0/run_log/EXCINFO/commLostInfo_0_1_slot_0.txt
/datadisk0/run_log/EXCINFO/bspstartuplog_0_1_slot_0.txt
/datadisk0/run_log/EXCINFO/Exc_0_1_slot_0.txt
/datadisk0/run_log/CPULOAD/Exc_CpuLoad_0_1_slot_0.txt
1)General symptoms
show card, and view that the state of the card switch among
The RUN indicator of the card switch among orange ---->solid on in green---->quickly
many times.
diagRslPeerShow(), view linkup states between each card and the main control card,
A certain card at the NE side and EMS side reports Board not running alarm
2)Collect information
Collect line card restarting prints under remote ODB mode and file ODB mode.
Disable user-state exception restarting policy, when the card is UP, log onto the card
as quickly as possible, and then under ADM process, enter: function which can forbid
the card to restart, and then log onto the line card to query specific reasons for the
exception.
Enter the function under the HWONLINE card's ADM process to get suspension file.
Enter the function under the HWONLINE card's ADM process to view whether the
process has power-on failure, if yes, get power-on failure log information
Collect user-state suspension files, kernel suspension files, BSP starting logs and slot
logs.
3)View line card starting print under remote ODB mode and file ODB mode
Under file ODB mode, capture starting prints of the card that is repeatedly restarted,
and view starting prints of the HWONLINE card, to observe when the HWONLINE card is
restarted, whether it fails to get version, or fails to get FPGA, or the process has power-on
Under the ADM process of the HWONLINE card via remote ODB, enter following
functions to view whether the card has tasks suspended, or kernel exception. Under the
FTM process of the HWONLINE card via remote ODB, enter following functions to view
kernel logs.
5)View whether the process has power-on failure or is powered on all the time.
Enter the HWONLINE card’s diag-shell, under the ADM process (remote ODB is
For specific process’s power-on information, collect through power-on log function.
Abnormal card tasks usually will lead to HWONLINE, generally, high priority task busy
or dead lock/dead loop, task suspended and so on happen.
1)General symptoms
Execute show processor, the CPU usage ratio of a certain card is very high.
Such information as suspension, dead lock, dead loop, PCIE suspension are
2)Collect information
Enter the HWONLINE card’s diag-shell, under the ADM process, view information
Collect user state suspension file, kernel suspension file, BSP starting log, slot log
View whether there are tasks with high CPU usage ratio.
1)General symptoms
The HWONLINE card has no tasks suspended, and has no process power-on failure;
Slot logs of user-state suspension file, kernel-state suspension file have no contents,
2)Collect information
Enter to the main control card’s diag shell IAP_RP process, and execute related
Enter to the main control card’s diag shell, and execute under the FNSC_SVR
process;
Enter to the active main control card’s diag shell, and execute under the
OFFCFGBRDMNG process;
Enter to the HWONLINE card’s diag shell, and execute under the FNSC_CLT_LP
process;
Enter to the HWONLINE card’s IAP_LP, XPON_LP process (uplink card has no
XPON_LP process, so it needn’t be focused on), and execute under the TDM_LP
process;
The whole service channel model of C600 is shown in the following figure.
C600 and C300 differ greatly in L2 switching: both C300’s main control card and line
cards adopt Ethernet L2 forwarding architecture of mac+vlan addressing standard, while
the biggest change of C600 L2 forwarding is that the main control card has no Ethernet
switching chip, but configure ZTE-developed SF3600 switching chip (can also be called
as cross-connect chip ). SF3600 is not used for switching Ethernet packets but switching
cell, Ethernet packets coming in are cut by the line card SSP-1 chip into several cells with
fixed length (cell length can be configured), while it is cell switching inside the main control
card, SF3600 sends the cell to the destination SSP-1 chip according to the information of
the cell header, and then at the exit, the chip transmits the cell out as Ethernet packets.
GPON data frames are converted into Ethernet packets through the XPP chip, and
sent to SSP-1 chip. First, SSP-1 chip gets the egress port through viewing the MAC table
according to the destination MAC and VLAN in the Ethernet packet, and then adds two
tags: destination chip number (in the SAHI header) and destination port (in the MF header)
to the packet header by microcode. The Ethernet packet is divided into several cells, and
each cell header contains the destination chip information to be sent to SF3600 which
forwards the packet to the uplink card’s SSP-1 chip according to cell header, and then the
uplink card’s SSP-1 chip reassembles the received cell header into Ethernet packets, and
retrieve the destination port information in the MF header, to send towards the destination
port.
Downstream unicast forwarding process is similar to upstream.
For abnormal unicast services, basic idea is to check each node of the data
forwarding channel along upstream and downstream directions, to see where services are
1)Make clear service forwarding channels and several main nodes on the OLT;
2)At each node, confirm whether services have reached this node and are normally
forwarded through viewing port state, MAC address, statistics and capturing packets;
3)If the OLT has enabled service-related protocol processing function (such as DHCP
option82, DHCP relay, PPPOE+, ARP proxy, MFF and so on), CPU is also one of service
channel nodes, but this node just process protocol packets, at the moment, you need to
check the whole protocol procedures from receiving packets to processing and then to
In a big way, specific troubleshooting is mainly divided into 3 big steps: analyze fault
symptoms,check configuration data and troubleshoot key nodes.
Before troubleshooting, you must identify the fault symptom, for example,unicast is
blocked, because individual ONU under a certain PON port has the problem, or all ONUs
under a certain PON port have the problem, or several PON ports have the problem, or all
ONUs under all PON ports of the faulty card have the problem, or users the whole NE’s
multiple slots have the problem, or multiple ONUs have the problem but scattered on
different PON ports or different slots or different NEs, or ONUs of some types have the
problem. Under what circumstances the problem occurs, blocked all the time or
sometimes blocked and sometimes unblocked, and under what circumstances services
are blocked and unblocked. The objective to make clear these symptoms is to be able to
preferentially choose the node that is most likely to be faulty from general troubleshooting
steps, so as to improve the locating efficiency.
configuration data first, to ensure that the configuration data is correct. The configuration
here refers to the general configuration that can be seen on CLI, such as clock
configuration, MAC address aging time configuration, MAC anti-spoofing, loop detection
bandwidth, FEC, encryption and remote management, etc.). The configurations can be
compared with the NEs with normal services when checking configurations.
Methods to check key node information are summarized into 4 kinds: check state,
NE-level problems:
Card and port state: View the physical state of the uplink port, the state of line cards,
the state of the inner port at the main control card side and cascading port, the state of the
ONU state: View ONU state, including on-line state and remote management state;
View address: View the MAC address table, ARP table and routing table of the main
control card, and view whether there is MAC spoofing and hash conflict;
View statistics: View SSP-1 statistics, and view PON MAC statistics of line cards;
Card-level problems:
Check state: Check the state of card’s CPU, memory and traffic
Environmental factors: View fans, card temperature and card clock;
Card and port state: View the physical state of the uplink port, the state of line cards,
the state of the inner port at the main control card side and cascading port, the state of the
inner port at the line card side and cascading port;
ONU state: View ONU state, including on-line state and remote management state;
Abnormal alarm: View alarm and notification logs;
View address: View address tables of the main control card and line cards, ARP table
and routing table of the main control card, and view whether there is MAC spoofing and
hash conflict between main control card and line card;
View statistics: View SSP-1 statistics, and view PON MAC statistics of line cards;
Capture packets: When services of the whole card are blocked, it doesn’t need to
capture packets;
ONU/user-level problems:
Check state:
Environmental factors: View fans, card temperature and card clock;
Card and port state: View the state of the inner port at the line card side and
cascading port, Rx / Tx optical power states of the PON link that the ONU corresponds to ;
ONU state: View ONU state, including on-line state and remote management state;
Abnormal alarm: View alarm and notification logs;
View address: View address tables of the main control card and line cards, ARP table
and routing table of the main control card, and view whether there is MAC spoofing and
hash conflict between main control card and line card;
View statistics: View SSP-1 statistics, and view PON MAC statistics of line cards;
Capture packets: Flow mirroring to capture packets;
In addition, if involving service-related protocol handling, you also need to focus on
CPU packet retrieving, packet transmitting and protocol handling states.
Steps:
Check state: View service-related protocols, such as uplink port LACP, port locating
(DHCP Option82, PPPOE+), working states of DHCP RELAY, ARP PROXY, MFF
service-related protocols, handling protocol-related tasks, and you also need to focus on
whether there are obvious abnormal prints under shell;
View statistics: View protocol packet statistics retrieved to CPU and protocol packet
statistics transmitted by CPU, and various statistics inside the protocol module;
View contents: Through prints of retrieving packets or transmitting packets, view
content of received / transmitted protocol packets;
Basic idea of C600 multicast fault diagnosis is generally troubleshoot along two
Overall idea:
node (including multiple sub-nodes) are normal, and whether multicast entries of protocols
2) Check whether downstream forwarding of data flows is consistent with the driver’s
forwarding entries and forwarding digitmap, and whether each forwarding node (including
PON
IAP
组播协议
FTM
IAP
线卡收发包 SW
FTM
驱动 主控/SF
L-CPU
FPP&SSP IAP
Pon 组播协议
FTM
ONU PUB&LIF PKTRX PPU
Mac
IAP
主控收发包
FTM
驱动
R-CPU
上联
SW SF
IAP
组播协议
FTM
IAP
线卡收发包 SW
FTM
驱动 L-CPU
FPP&SSP
Pon
SERVER PUB&LIF ODMA PPU
Mac
Proxy mode:
Join / leave packets transmitted by the user or the user responds the report packet
queried;
Host at the line card side changes the transmitted report packet or responds the
Host at the main control side changes the transmitted report packet or responds the
Snooping mode:
Join / leave packets transmitted by the user or the user responds the report packet
上联
IAP
组播协议
FTM
IAP
线卡收发包 SW
FTM
驱动 主控/SF
L-CPU
FPP&SSP IAP
Pon 组播协议
FTM
SERVER PUB&LIF PKTRX PPU
Mac
IAP
主控收发包
FTM
驱动
R-CPU
PON
SW SF
IAP
组播协议
FTM
IAP
线卡收发包 SW
FTM
驱动 L-CPU
FPP&SSP
Pon
ONU PUB&LIF ODMA PPU
Mac
Proxy mode:
Query packets received by the source port transmitted by the server, transparently
Query packets sent to the host at the line card side from the main control router;
Query packets sent to the user side from the line card router;
Snooping mode:
上联
组播协议
线卡收发包 SW
驱动 L-CPU
主控/SF
FPP&SSP
驱动
R-CPU
PON
SF SW
组播协议
线卡收发包 SW
驱动 L-CPU
FPP&SSP
Above figure shows each node of multicast downstream data flow forwarding
Forwarding procedures:
uplink NP receive multicast data flows, first find corresponding MID according to
VSIID+DIP+SIP (SSM mode), if failing to hit when finding, find corresponding MID
according to VSIID+DIP (ASM mode), if failing to hit when finding twice, handle according
to unknown multicast packets (if the flood property of vlan is drop-unkown, the driver will
drop it, otherwise, flood in corresponding vlan), if corresponding MID is hit, encapsulate it
into the header of SAIH and FMF and add to the packet header, and transfer to SF of the
The main control or switching card find corresponding SA bitmap according to MID,
and copy one multicast data flow to corresponding SA’s NP respectively to forward;
Line cards find corresponding exit according to MID, and copy one packet for each
exit, if multicast VLAN translation is configured, handle at the exit to perform vlan
Of them, the uplink card’s SA IP slices the packet into cell and send to SF, while the
Through multicast debug function, print protocol information about packet interaction,
protocol handling and FTM-layer driver setting on CLI. Analyze related protocol interaction
Related commands:
Through multicast protocol, record necessary information on the main control and line
cards: such as user join / leave , aging information and records and storage of related
failures, for users to query on the NE, used for locating multicast faults.
The function is enabled to the igmp protocol by default, and disabled to mld protocol
by default, and allocate memory when enabled. For above two protocols, there are enable,
manually reporting and server configuration command control respectively, and files are
Related commands:
CLI form: {igmp | mld} log server [vrf <vrfname>] server-ip <ipaddr> host-ip
确认故障现象
单板上所有ONU都注册 PON口上所有ONU都注册
PON口个别ONU注册不上
失败 不上
检查配置:
1)PON口shutdown*
检查线卡软件: 2)PON口连接的ONU信息和 检查配置:
1)登录线卡shell查看 Y PON口配置的ONU认证信息 Y 1)PON口连接的ONU信息和
* ONU处于OFFLINE? 不匹配(SN认证)* ONU处于OFFLINE? PON口配置的ONU认证信息
3)是否配置TYPEB保护组* 不匹配(SN认证)*
OLT软件问题,搜 N 4)ONU是否开电?
配置问题
N 2)ONU是否开电?
集信息联系研发分 Y 线卡任务挂起?
析*
N Y 检查配置: 检查光路: 配置问题
检查硬件问题: ONU处于LOGGING- 1)PON口配置的测距模式 1)主干光纤断或者弯折?
1)查看背板时钟锁定 LOS间变化? 和当前实际光纤距离不匹 2)测量上下行光功率过低或过
状态;* 配* 高?
2)reset线卡恢复?
N 3)光纤转接头或光模块有污
光路问题
3)插拔线卡恢复? 检查光路: 损?
4)更换线卡恢复? 1)主干光纤断或者弯折? 5)LOSi/Lofi/Sdi/SFi/LOAMi
2)测量上下行光功率过低或过 告警导致
高?
Y 时钟状态非锁定? 3)光纤转接头或光模块有污
光路问题 更换/清洁分支光 Y
损? 纤恢复?
4)光回损ORL超标;
N 5)存在长发光告警?* N
Y 复位/插拔单板恢 6)LOSi/Lofi/Sdi/SFi/LOAMi
复? 告警导致 ONU处于LOGGING- Y
SYNCMIB-LOS间变
N 化?
Y 光模块异常,记录
单板问题,记录单 插拔光模块恢复? 光模块厂家、型号 N
板SN返修或联系研 Y 更换单板恢复? 和SN
发分析 N 检查软件问题: Y
N 1)线卡shell下采集信息*
OMCI消息收发是否
更换/清洁光模块
Y 光模块损坏,记录
2)查看ONU的LOS原因*
正常*?
ONU型号是否都相 光模块厂家、型号
恢复?
同? 和SN N
N N
Y 确认软件没有异 OLT/ONU软件问
ONU处于LOGGING- Y 常? 题,联系研发分析
互通问题收集ONU SYNCMIB-LOS间变
型号版本信息,联 化? Y
系ONU研发分析
N 插拔ONU光纤恢 Y
复?
检查软件问题: Y OMCI消息收发是否
1)线卡shell下采集信息*
正常*? N
2)查看ONU的LOS原因*
重启ONU恢复? Y
N
确认软件没有异
N OLT软件问题,联
N
常? 系研发分析 检查硬件问题:
1)更换同型号的ONU;
Y 2)更换不同型号的
更换PON口是否正
N ONU型号是否都相
N ONU;
常? 同?
Y Y 更换同型号ONU恢
ONU个体问题
单板硬件问题,联
更换其它型号ONU
N 复?
系研发分析或记录
恢复?
SN并返修
Y 互通问题收集ONU
更换不同型号ONU
型号版本信息,联
互通问题收集ONU 恢复?
系ONU研发分析
型号版本信息,联
系ONU研发分析
ONUs fail to register and enter the WORKING state, general symptoms are:
ONU state is OFFLINE;
ONU state changes between LOGGING-LOS;
ONU state changes among LOGGING-SYNCMIB-LOS;
ONU state is AUTHFAILED
Of them, above symptoms can also be divided into: card -level problems, PON port -level
problems and ONU-level problems, which need to be differentiated.
Common reasons:
1) All ONUs on the whole card fail to register:
2)All ONUs under a certain PON port are in the OFFLINE state:
4)All ONUs under a certain PON port change between LOGGING-LOS states:
5)individual ONU under a certain PON port changes between LOGGING-LOS states:
6)All ONUs under a certain PON port change among LOGGING-SYNCMIB-LOS states:
states:
abnormal
Commands of different version may be different, which you need to get from the R&D
Institute.
execute smto(oltId-1,onuId)
execute smso(oltId,onuId,1000)
execute smeo(oltId,onuId,1000)
execute smao(oltId,onuId,1000)
execute sme(2000)
execute sma(2000 )
execute sms(2000 )
execute showOltSaveAlm(oltId-1)
execute showOnuSaveAlm(oltId-1, onuId)
确认故障现象
单板上所有ONU都注册 PON口上所有ON都注册
PON口个别ONU注册不上
不上 不上
搜集信息后,联系
任务挂起?
研发分析。* 检查光路:
检查光路:
1)主干光纤断? 1)测量光功率;
光路问题
光路问题 2)测量下行光功率过 2)是否有短时长发光
检查硬件问题: 低或过高; 统计?*
1)查看时钟状态;* 3)存在长发光告警?*
2)reset线卡恢复?
3)插拔线卡恢复: 检查软件问题
4)更换线卡恢复? 光模块异常,记录 1)线卡shell下用命令
插拔光模块恢复? 光模块厂家、型号 采集相关信息;*
和SN 2)插拔ONU光纤恢复?
3)重启ONU恢复?
时钟状态非锁定?
光模块损坏,记录
更换光模块恢复? 光模块厂家、型号
和SN 插拔ONU光纤恢
联系研发分析
复?
复位/插拔单板恢
复? 检查软件问题:
软件问题,联系研
1)线卡shell下采集信
发分析
息,OLT固件问题?* 重启ONU光纤恢 ONU问题,联系研
单板问题,记录SN 复? 发分析
返修或联系研发分 更换单板恢复?
单板硬件问题,联 检查硬件问题:
析
系研发分析或记录 1)其他PON口是否正
SN并返修 常? 检查硬件问题:
1)更换同型号的ONU;
2)更换不同型号的
ONU;
更换同型号ONU恢
ONU个体问题
复?
互通问题,联系研 更换不同型号ONU
发分析 恢复?
An EPON ONU fails to register, view via CLI that the ONU is in OFFLINE state or
power-off state, and the local PON LED of the ONU is off or flashing. Symptoms can also
be divided into:
Step 1: Confirm fault symptoms. Confirm whether ONUs on the whole card fail to register,
or ONUs under the whole PON port fail to register, or several ONUs fail to register;
a) Log in the line card’s serial port, query whether there are tasks suspended
under shell, if yes, collect necessary information and then contact the R & D
to analyze.
b) Check the clock state of the card, if the clock state is non-locked, it can be
problems;
c) If it can restore by resetting the line card or inserting line card, then the card
hardware has a problem, but symptoms are unstable, you need to continue
observing later;
d) If it can only be solved by changing a card can, the card hardware has a
2) ONUs under the whole PON port fail to register, troubleshooting steps:
d) Log in the line card’s serial port, collect ONU lower-layer registration
e) If the R & D judge that the software is normal, view whether there are other
PON ports on the same card also have the same problem, and then reset
plug / unplug the card to view whether it can be restored, if it still can't be
that the card has a problem, please record SN and contact R & D to confirm
d) Log in the line card’s serial port, collect lower-layer registration information,
e) If the R & D judge that the software is normal, you need to troubleshoot
through changing ONUs. First, change ONUs of the same type, and then
function
ZXAN(config)#control-panel
ZXAN(config-control-panel)#
capturing function
ZXAN (config-control-panel)end
ZXAN(control-panel)#snatch-cpu-packet slot 3 enable //Enable the line card cpu packet capturing
function
ZXAN(control-panel)#write snatch-packet slot 3 //Packet capturing process saving will disable the
ZXAN (control-panel)end
The commands are listed as the following, and you can choose IP, port information