Troubleshooting

Checking the AC Power Supplies

Troubleshooting an IGX Node

General Troubleshooting Procedures

Troubleshooting the IGX Console Alarms

Displaying a Summary of Alarms

Displaying the Status of Cards

User-Initiated Tests

Loopback Tests

Card Testing with External Test Equipment

Troubleshooting

This chapter describes how to diagnose problems. When a troubleshooting table in this chapter recommends replacement, refer to the procedures in Chapter 5, "Repair and Replacement".

The IGX operating system software does most of the IGX monitoring and maintenance. The only action that qualifies as preventative maintenance is checking the power supplies.

Checking the AC Power Supplies

You cannot directly measure voltages on the AC power supplies in an IGX node. If a problem exists with one of the supplies, one or both the DC and AC LEDs turns off. Refer to the chapter on repair and replacement for instructions on re-seating or replacing an AC power supply.

After you install new or additional cards in the node, check the LEDs on the power supplies to make sure the cards have not put an excessive load on the power supplies.

Troubleshooting an IGX Node

This section describes elementary troubleshooting procedures and briefly describes the commands used when troubleshooting an IGX node. (These commands are described in detail in the Cisco WAN Switching Command Reference.) This set of procedures is not exhaustive and does not take into account any of the diagnostic or network tools available to troubleshoot the IGX node.

Caution Do not perform any disruptive tests or repairs to the IGX node without first calling the Technical Assistance Center (TAC) through Cisco Customer Engineering. Cisco personnel can help isolate the fault and provide repair information. Within the United States, call 1-800-553-6387,
6:00 AM to 6:00 PM, Pacific Time, Monday through Friday. Outside the U.S., call 1-(408)-526-4000.

This section contains the following topics:

Troubleshooting tables for the IGX node.
System hardware status (configuring and displaying), including circuit cards, system buses, and power supplies.
CGA relay and forced yellow signal group channel additions.
Channel loopback and connection tests.
Alarm thresholds for statistical line errors, and line error display reporting.
External test equipment, such as a BERT.

General Troubleshooting Procedures

The IGX node regularly runs self-tests to ensure proper function. When the node finds an error condition that affects operation, it de-activates the affected card or line then selects a standby card or redundant line if one is available.

Caution A lit FAIL LED on a card indicates that an error occurred. Try resetting the light with the resetcd f command. If the FAIL LED lights up again, use Table 4-1 to find the cause and call the Cisco TAC to obtain information on isolating the problem and possibly replacing a component.
Table 4-1: General Troubleshooting

Symptom		Probable Cause		Remedy
1.	No indicators on IGX boards lit- console screen blank. LEDS on power supplies not lit.	1. .	IGX circuit breakers off.	1.	Switch on power switch.
		1. .	IGX circuit breakers off.	2.	Switch off, then switch on the power switch to reset the breaker. Check to see if any overload condition exists (shorted connections, crow-barred power supplies).
		2.	IGX power cord dislodged from plug.	3.	Reconnect power cord to AC receptacle.
		3.	Power supplies not functioning.	4.	Replace power supplies
2.	Front card FAIL LED on	1.	Front card experienced an error: NPM card UXM card ALM card BTM card NTM card UFM card FRM card UVM card CVM card HDM card LDM card ARM card	1.	Indicates an error occurred. First, reset the card with the resetcd f command. If the LED comes on again, call the TAC. NOTE: If an NPM fails in a non-redundant IGX system, you must reboot the system.


3.	Front card indicator lit—replacement card does not fix problem.	1.	Defective backplane is possible.	1.	If a new card does not fix problem, the backplane may be suspect (very uncommon failure). Contact the Cisco TAC.
4.	SDI card FAIL indicator lit.	1.	SDI card failed: SDI (EIA/TIA232) card SDI (EIA/TIA449) card SDI (V.35) card	1.	Indicates an error occurred. Check alarm status of card. First, reset the card with the resetcd f command. If the LED comes on again, re-seat card. If the LED comes on again, call the TAC.
5.	The FAIL indicator on any of the following card sets came on, but the replacement card does not fix the problem: UXM, ALM, BTM,UFM, ARM, CVM, LDM.	1.	Defective backplane is possible (very uncommon). NOTE: All cards connect to the backplane, and if the backplane is defective, it can cause a fault on any card.	1.	If new card does not fix the problem, contact the Cisco TAC.
5.		1.		1.
		2	Blown backplane fuse (very uncommon)	1	See backplane fuse section in the chapter titled "Repair and Replacement."
6.	Power Supply AC or DC Okay LED off.	1.	Possible power supply defect.	1.	Re-seat supply per instructions in the chapter titled "Repair and Replacement". Remove and replace power supply if defective.
		2.	PE-BC wiring or card defective.	1.	If power supplies output check out, then PE-BC wiring connections or card is suspect.
				2.	Make sure the plug connection to the PE-BC card is secure; tighten if it is not.
				3.	If the plug connection is secure and the Power Supply Monitor FAIL indicator is still lit, then remove and replace the SCM card.
7.	Command line display incorrectly shows wrong IGX system type.	1.	Jumper switch W6 in wrong position. (See card installation and node start-up chapter)	1.	To indicate an IGX 8420 node, the jumper must be in place. To indicate an IGX 8430 node, the jumper must be absent.
		2.	SCM circuitry that reads W6 setting may be defective.	1.	Verify SCM circuitry with a known good SCM.
8.	Neither "Okay" LED on a power supply is on.	1.	Defective fan or fans in cooling assembly is allowing the temperature in the enclosure to rise above 40º C.	1.	Verify fan tray fans are working. If not, replace tray according to instructions in the "Repair and Replacement".






		2.	Defective power supply fan in power supply allowing the power supply temperatures to rise above 40º C.	1.	If the system cooling fan assembly is functioning correctly, then a power supply fan is suspect. Remove cover over power supplies to determine if fan is rotating. Replacing a power supply fan is not a field repair. Replace the supply.
				2.	Issue a dsppwr command at the control terminal to check to see if the power supply fans are rotating, if not, remove and replace the power supply with the defective fan.
				3.	If power supply fans are functioning, then the power supply temperature sensor is defective. Remove and replace the temperature sensor
		3.	Defective SCM card	1.	If both the enclosure fan assembly and the power supply fans are working correctly (see symptom 6, probable causes 1, 2 and 3), then the SCM card is suspect.
		3.	Defective SCM card	2.	Remove and replace the SCM card.
9.	Console screen blank, IGX indicator lights lit.	1.	Control terminal switched off.	1.	Switch on the control terminal.
		2.	Control terminal power cord disconnected.	2.	Reconnect the control terminal power cord to 208/240 vac power outlet.
		3.	EIA/TIA-232 cable loose or disconnected from the Control Terminal port on the SCM, or from the control terminal.	3.	Reconnect the EIA/TIA-232 cable to Control Terminal port on the SCM back card or to the control terminal itself.
		4.	Control terminal malfunctioning.	4.	Refer to the control terminal manufacturer's manual.
10.	Printer not functioning	1.	Printer switched off.	1.	Switch on the printer.
		2.	Printer out of paper.	2.	Renew the paper supply.
		3.	Printer power cord disconnected.	3.	Reconnect the printer cord to 208/240 vac power outlet.
		4.	EIA/TIA-232 cable loose or disconnected from the Control Terminal port on the SCM, or from the printer.	4.	Reconnect EIA/TIA-232 cable to the Control Terminal port on the SCM back card or to the printer itself.
		5.	Printer malfunctioning.	5.	Refer to the printer manufacturer's manual.
11.	Modem not functioning.	1.	Modem switched off	1.	Switch on the modem.
		2.	Modem power cord disconnected.	2.	Reconnect modem power cord.
		3.	EIA/TIA-232 cable loose or disconnected from the Control Terminal port on the SCM, or from the modem.	3.	Reconnect the EIA/TIA-232 cable to the Control Terminal port on the SCM back card or the modem itself.
		4.	Telephone hookup cable disconnected.	4.	Reconnect the telephone hookup cable.
		5.	Modem malfunctioning.	5.	Refer to the modem manufacturer's manual.
		6.	DIP switches not set correctly.	6.	Refer to the modem manufacturer's manual.
12.	Data Frame Multiplexing (DFM) is not functional.	1.	DFM has been not been enabled by the Cisco TAC.	1.	Contact the Cisco TAC.
13.	DFM has been enabled but does not function.	1.	DFM only runs on speeds up to 64Kbps.	1.	Readjust the speed.
14.	Background noise or music sounds choppy.	1.	VAD problem.-VDP needs sensitivity adjustment	1.	Contact the Cisco TAC.
15.	High speed modem drops to low speed.	1.	ADPCM is taking over.	1.	Contact the Cisco TAC.
16.	Bundled (Frame Relay) connections have failed.	1.	One or more bundled connections have failed.	1.	Contact the Cisco TAC.

Troubleshooting the IGX Console Alarms

The initial mode of troubleshooting the IGX node uses the console alarms displayed on the console screen. Table 4-2 provides you with a procedure for isolating the alarms and thereby isolating the fault. Any repair to the IGX node must be performed by Cisco-qualified personnel.

Caution When using Table 4-2 for troubleshooting, call the Cisco TAC before performing any disruptive testing, or attempting to repair the IGX node, to ensure that you have isolated the correct problem area, and also to enable Cisco personnel to provide you with assistance in performing the necessary procedures.

Table 4-2: Troubleshooting the IGX Console Alarms

Symptom	Probable Cause		Remedy
MAJOR/MINOR alarm flashing on affected console screen.		Failed connection Failed circuit lines Failed trunks Failed cards Unreachable node High error rate on circuit lines or trunks.	1.	Use dspnw command to identify the node(s).
			2.	Use vt command to place yourself at the affected node, and use dspalms to identify the alarm type.
				a. If the alarm display indicates a failed connection, go to probable cause 1.
				b. If the alarm display indicates a failed circuit line, go to probable cause 2.
				c. If the alarm display indicates a failed trunk, go to probable cause 3.
				d. If the alarm display indicates a failed card, go to probable cause 4.
				e. If the alarm display indicates an unreachable node, go to probable cause 5.
	1.	Failed connection.	1.	Use the dspcons command to identify which connections have failed and to determine the remote end connection assignments.
			2.	Use the dsplog command to determine the cause of failure of the connections. These failures could consist of failed circuit lines, trunks cards, or clock over speeds.
				a. If the connections have failed due to a circuit line failure, go to probable cause 2.
				b. If the connections have failed due to a packet line failure, go to probable cause 3.
				c. If the connections have failed due to a card failure, go to probable cause 4.
				d. If connections have failed due to a clock over speed condition, go to probable cause 5.
	2.	Failed circuit line.	1.	Use the dspclns command to identify the circuit line number and failure type.
				a. If the failure is a circuit line local CGA (no pulses received at the local end of circuit line) go to probable cause 2a.
				b. If the failure is a circuit line remote CGA (no pulses received at the remote end of circuit line), go to probable cause 2B.
				c. If the failure is circuit line frame slips (indicating excessive frame slips on the T1 between the IGX node and the PBX) go to probable cause 2C.
				d. If the failure is circuit line bipolar errors (indicating excessive bipolar errors on this circuit line) go to probable cause 2D.
	2A	Circuit line local CGA.	1.	Use the dsplog command to determine date, time of day, and the duration of the CGA alarm.
			2.	Determine if the PBX T1 subrate or the PBX E1 interface went down at the time the CGA alarm was logged by the IGX node.
			3.	Check cabling between IGX node and the PBX and make necessary repairs if defective.
			4.	Make a note of the steps taken and call the Cisco TAC.
	2B	Circuit line remote CGA.	1.	Refer to remedies for probable cause 2A.
	2.	Circuit line frame slips.	1.	Use the dsplog command to determine date, time of day, and duration of the frame slip alarm. Also determine if the clock source for this line has changed due to line failure in the network.
			2.	Use the dspclnerrs command to quantify frame slips and rate information.
			3.	Use the dspclnhist command to obtain historical information on frame slips.
			4.	Use the dspcurclk command to identify the current clock source and path to the current clock source.
			5.	Use the clrclnalm command to clear the circuit line alarms
			6.	Make a note of the steps taken, and call the Cisco TAC.
	2	Circuit bipolar errors.	1.	Use the dsplog command to determine when the bipolar error threshold was exceeded, and the duration of the alarm.
			2.	Use the dspclnerrs command to quantify the bipolar errors.
			3.	Use the dspclnhist command to obtain historical information on bipolar errors.
			4.	Check cabling between IGX node and the PBX for loose connections, and tighten it if it is loose.
			5.	Use the clrclnalm command to clear line alarms.
			6.	Make a note of the steps taken, and call the Cisco TAC.
	3.	Failed trunk.	1.	Use the dsptrks command to identify the remote end node name, trunk numbers at each end, and the type of failure.
				a. If the display shows a communication failure, go to probable cause 3A.
				b. If the display shows a local CGA, go to probable cause 3B.
				c. If the display shows a remote CGA, go to probable cause 3C.
				d. If the display shows a bipolar error, go to probable cause 3D.
				e. If the display shows a frame slip error, go to probable cause 3E.
				f. If the display shows an out-of-frame error, go to probable cause 3F.
				g. If the display shows a time-stamped packet drop error, go to probable cause 3G.
				h. If the display shows a non time-stamped packet drop error, go to probable cause 3H.
				i. If the display shows a loop-back, go to probable cause 3I.
	3A	Communication Failure.	1.	Use the dsplog command to determine when the communication failure or CGA occurred, and identify connections which may have failed due to lack of bandwidth on an alternate route.
			2.	Use the dsptrkerrs command at each end of the packet line to quantify errors, and determine if they are unidirectional or bidirectional.
			3.	Call telephone carrier and request span testing. Ask the carrier to perform BER tests using multiple test patterns, including standard quasi, all 1, and 3 and 24 patterns.
			4.	Make a note of the steps taken, and call the Cisco TAC.
	3B	Local CGA—indicates no pulses at the local end of the trunk.	1.	Refer to probable cause 3A remedies.
	3C	Remote CGA—indicates no pulses at the remote end of the trunk.	1.	Refer to probable cause 3A remedies
	3D	Bipolar errors—indicates excessive bipolar errors on this trunk.	1.	Use the dsplog command to determine the date, time of day, and the duration of the alarm.
			2.	Use the dsptrkerrs command at each end of the trunk to quantify errors, and determine whether they are unidirectional, or bi-directional.
			3.	Use the dsptrkhist command at each end of the trunk to collect historical information on line errors.
			4.	Use the clrtrkalm command to clear trunk alarms.
			5.	Call the Cisco TAC for assistance. Cisco personnel can monitor line errors, and may advise disruptive testing to be scheduled with telephone carrier.
			6.	Call telephone carrier and request span testing. Ask the carrier to perform BER tests using multiple test patterns, including standard quasi, all 1, and 3 and 24 patterns.
			7.	If telephone carrier is unable to isolate the problem on the span, contact Customer Support for assistance.
	3E	Frame slip errors indicates excessive frame slips on this trunk	1.	Refer to probable cause 3D remedies.
	3F.	Out-of-frame errors—indicates excessive out-of-frame errors on the trunk.	1.	Refer to probable cause 3D remedies
	3G	Time-stamped packet drops— indicates time-stamped packet drops have exceeded the threshold for generating an alarm.	1.	Use the dsplog command to determine when the dropped packet alarm threshold was exceeded, and determine the duration of the alarm.
			2.	Use the dspload command alarm. to determine the current loading of this trunk.
			3.	Make a note of steps taken and call Customer Support. Refer to probable cause 3G remedies.
	3H	Non time-stamped packet drops—indicates that non time -stamped packet drops have exceeded the threshold for generating an alarm.		Refer to probable cause 3G remedies.
	3I.	Loop-back.	1.	Determine if company personnel are performing span tests with CSU loop-backs, demarc, or DSX panel.
			2.	If company personnel are performing loop-back tests, ask them to indicate when they have completed testing, and monitor the system to ensure that the loop-back indication disappears when testing is complete.
			3.	If company personnel are not performing loop-back tests, telephone carrier most likely has the E1 span in loop-back mode.
			4.	Call telephone carrier to verify that they are testing the E1 span, and ask them to indicate when they have completed their tests. Monitor the system to ensure that the loop-back indication disappears when testing is completed.
			5.	Make a note of the alarm steps taken, and call Customer Support.
	4.	Failed cards—indicates the number of cards that failed.	1.	Use the dspcds command to determine which card has failed, along with its status (active or standby).
			2.	Use the dsplog command to determine time of day the card failed and whether or not any connections using this card are also in a failed condition.
			3.	If the failed card is an HDM or LDM card, use the dspbob command at each end of the connection using this card to verify that data is passing. For a CDP, CVM, or UVM, use the dspchstats command.
			4.	If a card has failed, make a note of the steps taken, and call Customer Support.
	5.	Unreachable node—shows the number of unreachable nodes in the network.	1.	At any node, use the dsplog command to determine the date and time of day that the node became unreachable. A node is usually unreachable due to a trunk failure or a power outage.
			2.	Contact personnel at that node to determine whether or not there was a power failure at the time logged by the IGX node.
			3.	If there was a power failure, check that NPM comes up and run diagnostics.
			4.	If there was not a power failure, call Customer Support.
	6.	Clock Overspeed.	1.	Use the dspbob command to determine the incoming baud rate for this connection.
			2.	Use the dspcon command to verify that the console incoming baud rate is the same as the configured baud rate.
			3.	Reconfigure the incoming baud rate to match the configured baud rate.
			4.	Make a note of the steps taken, and call Customer Support.

Displaying a Summary of Alarms

The first step in troubleshooting an IGX node is to check the condition of the system by displaying alarm conditions throughout the system. To see a summary of all of the alarms on an IGX node, use the dspalms (display current node alarms) command. The alarms summary includes the following:

Number of failed connections.
Number of major and minor alarms.
Number of failed cards.
Power monitor failures.
Bus failures (either failed or needs diagnostics).
Number of alarms on other nodes in the network.
Number of unreachable nodes in the network.

Note You cannot include the dspalms command in a job.

To display alarms enter the command dspalms.

If the screen indicates a failure, refer to the commands in Table 4-3 to further isolate the fault.

Failure	Diagnostic Commands
Connection	dspcons (display connections)
Line Alarm	dspclns (display circuit lines)
Trunk	dsptrks (display trunks)
Cards	dspcds (display cards)
Power Monitor/Fans	dsppwr (display power supply status)
Remote Node	dspnw (display network)
Unreachable Nodes	dspnw (display network)
Remote Node Alarms	dspnw (display network)

Table 4-3: Fault Isolation Commands

Failure Diagnostic Commands

Connection

dspcons (display connections)

Line Alarm

dspclns (display circuit lines)

Trunk

dsptrks (display trunks)

Cards

dspcds (display cards)

Power Monitor/Fans

dsppwr (display power supply status)

Remote Node

dspnw (display network)

Unreachable Nodes

dspnw (display network)

Remote Node Alarms

dspnw (display network)

Displaying the Status of Cards

When a card indicates a failed condition on the alarm summary screen, use the dspcds command to display the status of the cards on a node. The information displayed for each card type includes the slot number, software revision level, and card status. (Note that you cannot use dspcds in a job.)

Note If dspcds or any other command incorrectly states the IGX model (for example, stating that an IGX 8420 node is an IGX 8430 node), check the jumper switch W6 on the SCM. A jumpered W6 indicates an IGX 8420 node. An open W6 indicates an IGX 8430 node. The chapter titled "Card Installation and Node Startup" documents this aspect of the SCM.

All the possible status descriptions for each card type appear in Table 4-4.

Card Type	Status	Description
All card types (including CVM)	Active		Active card
	Active—F		Active card with non terminal-failure.
	Standby		Standby card
	Standby—F		Standby card with non-terminal failure.
	Standby—T		Standby card performing diagnostics.
	Standby—F-T		Standby card with non terminal failure performing diagnostics.
	Failed		Card with terminal failure.
	Unavailable		Card is present but it may be in any of the following states:
		1.	The node does not recognize the card (may need to be re-seated).
		2.	The card is running diagnostics.
	Down		Downed card.
	Empty		No card in that slot.
	Active—T		Active card performing diagnostics.
NPM	Same status as for all card types, plus:
	Updating		Standby NPM downloading the network configuration from an active NPM.
			NOTE: Red FAIL LED flashes during updating.
	Cleared		NPM is preparing to become active.
	Loading Software		There are downloader commands that appear when the system is down- loading software to the NPM.

Table 4-4: Card Status

Card Type Status Description

All card types (including CVM)

Active

Active card

Active—F

Active card with non terminal-failure.

Standby

Standby card

Standby—F

Standby card with non-terminal failure.

Standby—T

Standby card performing diagnostics.

Standby—F-T

Standby card with non terminal failure performing diagnostics.

Failed

Card with terminal failure.

Unavailable

Card is present but it may be in any of the following states:

1.

The node does not recognize the card (may need to be re-seated).

2.

The card is running diagnostics.

Down

Downed card.

Empty

No card in that slot.

Active—T

Active card performing diagnostics.

NPM

Same status as for all card types, plus:

Updating

Standby NPM downloading the network configuration from an active NPM.

NOTE: Red FAIL LED flashes during updating.

Cleared

NPM is preparing to become active.

Loading Software

There are downloader commands that appear when the system is down- loading software to the NPM.

Note Cards with an "F" status (non terminal failure) are activated only when necessary (for example, when there is no card of that type available). Cards with a failed status are never activated.

To display cards execute the dspcds command. The dspcds command cannot be included in a job. Refer to the Cisco WAN Switching Command Reference for more information.

User-Initiated Tests

Several user-commands help you test the node status. The CLI commands are:

tstcon for voice connections
tstcon for data connections
tstport for data and frame relay ports

For details on these commands, see the troubleshooting chapter in the Cisco WAN Switching Command Reference.

Loopback Tests

Loopback tests are available to help diagnose the state of the IGX system. The CLI commands for activating these tests are:

CVM/NTM for implicit internal loopback.
Voice: addloclp, addrmtlp
Data: addloclp, addrmtlp
Frame relay: addloclp

For detailed information on these commands, see the Cisco WAN Switching Command Reference.

Card Testing with External Test Equipment

The HDM/SDI or LDM/LDI card set can be tested as a pair at the local node using external test equipment such as a Bit Error Rate Tester (BERT). This can be useful in isolating "dribbling" error rates to either the cards, the frame relay data input, or the transmission facility. This test checks the data path from the electrical interface at the port through the card set to the Cellbus in both directions of transmission.

Note This is a disruptive test. Notify your network administrator before performing this test.

To perform this test, proceed as follows:

Step 1 Disconnect the cable connection to the SDI or LDI and connect the BERT in its place.

Step 2 Set up an internal loopback on the frame relay port to be tested using the Add Local Loopback (addloclp) command.

Step 3 Turn on the BERT, make sure it indicates circuit continuity, and observe the indicated error rate.

Step 4 If there are any errors indicated, first replace the back card and retest. If the errors remain, then replace the front card and retest.

Step 5 When the test is complete, disconnect the BERT and reconnect the data cable. Release the local loopback by using the Delete Loopback (dellp).

Step 6 Repeat at the node at the other end of the connection if necessary.

Table of Contents