cc/td/doc/product/rtrmgmt/ugm/ugm1
hometocprevnextglossaryfeedbacksearchhelp
PDF

Table of Contents

Managing Faults with Cisco UGM

Managing Faults with Cisco UGM

This chapter contains the following sections:

Overview of Fault Management

With the Event Browser in Cisco UGM, you can identify alarm events and take appropriate action to resolve them quickly and efficiently; in addition, you can forward user-specified SNMP traps to any configured remote host, and continuously export all alarm events, as they are raised, to a user-specified text file.

With Cisco UGM, you can decommission and commission chassis and card objects for maintenance.

Monitored Events

All faults detected by Cisco UGM are referred to as alarm events. Faults are generated from these sources:

You can use the Event Browser to view alarm events raised against an object; various filtering criteria are provided by the Query Editor.


Note   Only SNMP traps from managed devices are reported by
Cisco UGM; traps from any other unsupported device are discarded. Moreover, the set of supported
traps is predefined and nonconfigurable, and any unsupported trap is discarded.

.


Table 9-1: Traps from Managed Devices

Fault Attribute MIB Source

Link Down or Link Up trap from any DS1, DS3, or Ethernet interface.

Raises major and normal alarms respectively.

IfTable:
IfIndex, ifType, ifAdminStatus, ifOperStatus

IF-MIB

SNMP trap; Link Down trap is cleared by one or more Link Up traps for the same interface.

Cold Start trap from the device.

Raises warning alarm.

ColdStart trap

SNMPv2-MIB

SNMP trap.

Warm Start trap from the device.

Raises warning alarm.

WarmStart trap

SNMPv2-MIB

SNMP trap.

Authentication Failure trap from the device.

Raises major alarm.

AuthenticationFailure trap

SNMPv2-MIB

SNMP trap.

Card OIR trap from the device.

Raises warning alarm and performs discovery on the affected device.

cefcFRUInserted trap
cefcFRURemoved trap

CISCO-
ENTITY-FRU-CONTROL-
MIB

SNMP trap.

Card inserted or removed in the device.

Raises normal alarms.

alarmDirectory:

entPhysicalContainedIn trap

ENTITY-MIB

Internal.

Environment Monitoring Traps from the device.

Raises critical alarm for the shutdown trap, and major alarm for all the other traps.

EnvMonShutdownNotification trap
EnvMonVoltageNotification trap
EnvMonTemperatureNotification trap
EnvMonFanNotification trap
EnvMonRedundantSupplyNotification trap

CISCO-ENVMON-MIB

SNMP trap.

Loss or re-establishment of communication with device1.

Raises major and normal alarms respectively.

Not applicable

Internal.

Communication lost alarm cleared by the communication established alarm.

Device or card commissioned or decommissioned2.

Raises informational alarm in both cases.

Not applicable

Internal.

Server disk usage above the major threshold.

Raises major alarm.

Not applicable

Internal.

Cleared when disk usage is below the major threshold.

Server disk usage above the critical threshold.

Raises critical alarm.

Not applicable

Internal.

Cleared when disk usage is below the critical threshold.

Graceful Shutdown operation was interrupted.

Raises major alarm.

Not applicable

Internal.

Accept Traffic operation was interrupted.

Raises major alarm.

Not applicable

Internal.

1See the "Overview of Presence Polling and Loss of Communication with a Device" section.
2See the "Overview of the Commission/Decommission Function for a Chassis" section.

Overview of Presence Polling and Loss of Communication with a Device

You can detect communication loss with a managed device by using presence polling. Loss of communication can occur for various reasons:

Presence Polling Retries

When Cisco UGM first detects loss of communication to a managed device, it does not immediately transition the device to the errored state but retries presence polling. Select the number of retries as described in the "Setting Number of Retries Before Loss of Communication" section.

Presence Polling Intervals

Presence polling uses an interval specified in the "Setting Presence Polling Intervals for Devices in Normal and Errored States" section. If all the communication attempts prove unsuccessful, the device transitions to the errored state. An internal alarm event (communicationLost) with a Major severity level is raised against the affected device.

The default presence polling intervals are:

Duration of Communication Loss

When communication is re-established, the device returns to a normal state, and an internal alarm event (communicationEstablished) with a Normal severity level is raised against the affected device.

If communication is restored after the duration specified in the "Setting Loss of Communication Duration" section, Cisco UGM discovers the device's subcomponents to detect any card inventory changes that may have occurred during the loss of communication.

If communication is restored within the specified duration, Cisco UGM transitions the device to the normal state.

Setting Presence Polling Intervals for Devices in Normal and Errored States


Step 1   In Map View, choose ASEMSConfig > EMS > Settings.

Step 2   Enter the interval at which a device should be polled in the normal state.

The interval should be an integer value that is 300 or larger (representing seconds). The default is 900 seconds.


Note   This value depends on the total number of managed devices in your network. You may need to change this value a few times in order to determine the optimum setting for your network.

Step 3   Enter the interval at which a device should be polled in the errored state.

The interval should be an integer value that is 300 or larger (representing seconds). The default is 915 seconds.


Note   This value depends on the total number of managed devices in your network.

Do not enter the same value as for devices in the normal state. A different value avoids overlapping polling intervals for normal and errored states.

Step 4   Click Apply.


Setting Number of Retries Before Loss of Communication

When Cisco UGM first detects loss of communication to a managed device, it does not immediately transition the device to the errored state, but retries presence polling by using the polling interval specified in the "Setting Presence Polling Intervals for Devices in Normal and Errored States" section. If these communication attempts are unsuccessful, the device transitions to the errored state.


Step 1   In Map View, select ASEMSConfig > EMS > Settings.

Step 2   Enter the number of times Cisco UGM tries to re-establish connectivity before transitioning the device into the errored state.

The number entered should be an integer value that is 0 or larger. A value of 0 disables retries; the default is 1.


Note   A large value causes a delay before loss of communication with a device is detected.

Step 3   Click Apply.


Setting Loss of Communication Duration


Step 1   In Map View, choose ASEMSConfig > EMS > Settings.

Step 2   Enter a time interval for which communication must be lost in order to start discovery.

The interval should be an integer value that is 15 or larger (representing minutes). The default is 15 minutes.


Note   A large value results in card inventory changes that are not detected.

If communication is restored after this interval, Cisco UGM initiates discovery of the device's subcomponents to detect any card inventory changes that may have occurred during the loss of communication.

If communication is restored within this interval, Cisco UGM transitions the device to the normal state.

Step 3   Click Apply.


Overview of the Event Browser

You can start the Event Browser from the Launchpad or from the pop-up menu for the individual object within Map Viewer.

With the Event Browser, you can perform these tasks:

You can see all events—regardless of your access privilege. In the Event Browser window, you can check the Ack (acknowledge) box next to an event to communicate to other users that you are planning to deal with that particular event. When you resolve the event, click the Clear box so that other users are informed of this.


Note   Only the most severe alarm event against an object appears next to its icon within Map Viewer.

You can view additional alarm details by using the Event browser. For more information, refer to the Cisco Element Management Framework User's Guide.

Using the Event Browser


Step 1   In the Map Viewer, note the color coding of status dots to represent the occurrence of alarm events against the objects.

See the "Overview of Alarm Events" section for an explanation of the colors.

Step 2   Right-click the object whose list of alarm events you want to view and choose Tools > Open Event Browser.


Using the Query Editor

If you do not want to view all events in the system, set up a query by using the Query Editor to view only specific events.

The criteria that you use to specify a query are on individual tabs. The Event Browser is updated with only those events that match the query criteria. A progress bar indicates that Cisco UGM is querying events and the window is being updated.


Caution   Any changes that you make to a query are not stored when you exit the Event Browser.

If you have specified different queries, you can open more than one Event Browser session at a time.

For details about the Query Editor refer to the Cisco Element Manager Framework User's Guide.


To access the Query Editor from the Event Browser, choose Edit > Query Setup.


Overview of Alarm Events

In the Map Viewer tree, you can see raised alarm events by the presence of colored dots next to tree objects in the left pane and by colored annotations against the object icons in the right pane.

The dots are color coded to reflect the following severity levels (highest to lowest): critical, major, minor, informational, and normal.

The defined color coding is:

A device or card object can be in either commissioned or decommissioned state within Cisco UGM.

If an object is in a commissioned state, alarm events against that object are propagated to the physical tree in the Map Viewer and appear in the parent objects to the region level.

For decommissioned objects, alarm events are not propagated up to the physical tree in the Map Viewer.

For details on commissioning and decommissioning objects, see the "Overview of the Commission/Decommission Function for a Chassis" section.

The following table describes Cisco UGM alarm events, their severity, explanation, and recovery procedures.


Table 9-2: Cisco UGM Alarm Events

Alarm Event Alarm Severity Explanation

ciscoColdStart

Warning

You started the device object from a power-off state.

Note   Clear this event manually.

ciscoWarmStart

Warning

You restarted the device object from an on state.

Note   Clear this event manually.

ciscoLinkDown

Major

A DS1 or Ethernet interface is down.

ciscoLinkUp

Normal

A DS1 or Ethernet interface is up.

ciscoAuthenticationFailure

Major

The device received an SNMP message that was improperly authenticated.

cardInserted

Warning

You inserted a new card in the device;
Cisco UGM initiates discovery on the device.

cardRemoved

Warning

You removed a card from the device;
Cisco UGM initiates discovery on the device.

Card inserted in slot

Informational

You inserted a new card in the device;
Cisco UGM completes discovery on the device.

Card removed in slot

Informational

You removed a card from the device;
Cisco UGM completes discovery on the device.

envMonShutdown

Critical

A critical environmental condition is detected and a device shutdown is imminent.

envMonVoltage

Major

A voltage threshold was exceeded on the device.

envMonTemperature

Major

A temperature threshold was exceeded on the device.

envMonFan

Major

The fan on the device has failed.

envMonRedundantSupply

Major

The power supply on the device has failed.

communicationLost

Major

Cisco UGM lost SNMP connectivity with the device.

communicationEstablished

Normal

Cisco UGM established SNMP connectivity with the device.

entityDecommisioned

Informational

Device or card object has been decommissioned.

entityCommissioned

Informational

Device or card object has been commissioned.

fileSysAboveMajor

Major

Server disk usage is over the user-defined major threshold1.

fileSysAboveCritical

Critical

Server disk usage is over the user-defined critical threshold2.

fileSysBelowMajor

Normal

Server disk usage is below the user-defined major threshold.

fileSysBelowCritical

Normal

Server disk usage is below the user-defined critical threshold.

gracefulShutdownInterrupted

Major

During a Graceful Shutdown operation, loss of communication with the device occurred or it was decommissioned.

Note   Clear this event manually.

acceptTrafficInterrupted

Major

During an Accept Traffic operation, loss of communication with the device occurred or it was decommissioned.

Note   Clear this event manually.

1For details on changing this threshold, see the "Example: Sample Configuration File for Fault Management" section.
2For details on changing this threshold, see the "Example: Sample Configuration File for Fault Management" section.

Clearing Alarm Events

If you manually clear an alarm event for an object in the Event Browser, that object appears in the Map Viewer with an alarm notification reflecting the next highest alarm present for that object. This change in alarm severity appears in the Map Viewer, even if the fault condition has not actually been corrected.

Cisco UGM does not generate all alarm events again, even if the alarm conditions are still present; therefore, be cautious in clearing alarm events.


Step 1   In the Map Viewer, note the color coding of status dots to represent the occurrence of alarm events against the objects.

See the "Overview of Alarm Events" section.

Step 2   Right-click the object whose list of alarm events you want to view and choose Tools > Open Event Browser.

You can acknowledge and clear individual alarm events by clicking the appropriate box next to each event.


Overview of Trap Forwarding

Specifying New Trap Forwarding Hosts

By using the Trap Forwarding Deployment Wizard, you can:


Step 1   Choose ASEMSConfig > TrapForwarding > Deploy Trap Forwarding Hosts.

Step 2   Follow the instructions provided by the Deployment wizard.

Step 3   In the Map viewer window, choose ASEMSConfig > Trap Forwarding > Trap Forwarding Properties.

Step 4   To enable trap forwarding, click Accept Saved Setting.


Specifying New Trap Specifiers for a Trap Forwarding Host


Step 1   From the Map Viewer, open ASEMSConfig.

Step 2   Expand the Trap Forwarding tree by clicking on the + (plus) sign.

Step 3   Open the Trap Specifiers Deployment Wizard.

Step 4   Right-click the host destination for which you wish to add a new trap specifier and select Deploy Trap Specifiers.

Step 5   Follow the instructions provided by the Deployment wizard.

Step 6   In the Map Viewer, choose ASEMSConfig > Trap Forwarding > Trap Forwarding Properties.

Step 7   To update trap forwarding, click Accept Saved Setting.

The trap forwarding action triggered reflects any changes made (and saved) in this dialog box. Any previously specified trap forwarding action is replaced.


Changing Previously Specified Trap Forwarding Data


Step 1   In the Map Viewer, choose ASEMSConfig > Trap Forwarding > Trap Forwarding Properties.

Step 2   Enter your changes.

Step 3   Click the Save icon from the dialog toolbar, or choose File > Save.

Step 4   To update trap forwarding, click Accept Saved Setting.

The trap forwarding action triggered reflects any changes made (and saved) in this dialog. Any previously specified trap forwarding action is replaced.


Removing Previously Specified Trap Forwarding Data


Step 1   From the Map Viewer, open ASEMSConfig.

Step 2   Expand the Trap Forwarding tree by clicking the + (plus) sign.

Step 3   Expand any listed host destination by clicking the + (plus) sign.

Step 4   Right-click the object to be deleted (a host destination, or a specific trap specifier for a given host destination) and choose Deployment > Delete Objects.

Step 5   In the Map Viewer, choose ASEMSConfig > Trap Forwarding > Trap Forwarding Properties.

Step 6   To update trap forwarding, click Accept Saved Setting.

The trap forwarding action triggered reflects any changes made (and saved) in this dialog. Any previously specified trap forwarding action is replaced.


Tip To deactivate or disable all trap forwarding, you must delete all host destinations and click Accept Saved Setting.

To resume trap forwarding, re-enter the host destinations.


See the "Specifying New Trap Forwarding Hosts" section.

Example: Cisco UGM Trap Mapping Tables


Table 9-3: Cisco AS5350 Trap Mapping

Class Mapping Enterprise Generic ID Specific ID Severity Color

ciscoColdStart

1.3.6.1.4.1.9.1.313

0

0

warning

Cyan

ciscoWarmStart

1.3.6.1.4.1.9.1.313

1

0

warning

Cyan

ciscoLinkDown

1.3.6.1.4.1.9.1.313

2

0

major

Orange

ciscoLinkUp

1.3.6.1.4.1.9.1.313

3

0

normal

Green

ciscoAuthenticationFailure

1.3.6.1.4.1.9.1.313

4

0

major

Orange


Table 9-4: Cisco AS5400 Trap Mapping

Class Mapping Enterprise Generic ID Specific ID Severity Color

ciscoColdStart

1.3.6.1.4.1.9.1.274

0

0

warning

Cyan

ciscoWarmStart

1.3.6.1.4.1.9.1.274

1

0

warning

Cyan

ciscoLinkDown

1.3.6.1.4.1.9.1.274

2

0

major

Orange

ciscoLinkUp

1.3.6.1.4.1.9.1.274

3

0

normal

Green

ciscoAuthenticationFailure

1.3.6.1.4.1.9.1.274

4

0

major

Orange


Table 9-5: Cisco AS5800 Trap Mapping

Class Mapping Enterprise Generic ID Specific ID Severity Color

ciscoColdStart

1.3.6.1.4.1.9.1.188

0

0

warning

Cyan

ciscoWarmStart

1.3.6.1.4.1.9.1.188

1

0

warning

Cyan

ciscoLinkDown

1.3.6.1.4.1.9.1.188

2

0

major

Orange

ciscoLinkUp

1.3.6.1.4.1.9.1.188

3

0

normal

Green

ciscoAuthenticationFailure

1.3.6.1.4.1.9.1.188

4

0

major

Orange


Table 9-6: Cisco AS5850 Trap Mapping

Class Mapping Enterprise Generic ID Specific ID Severity Color

ciscoColdStart

1.3.6.1.4.1.9.1.308

0

0

warning

Cyan

ciscoWarmStart

1.3.6.1.4.1.9.1.308

1

0

warning

Cyan

ciscoLinkDown

1.3.6.1.4.1.9.1.308

2

0

major

Orange

ciscoLinkUp

1.3.6.1.4.1.9.1.308

3

0

normal

Green

ciscoAuthenticationFailure

1.3.6.1.4.1.9.1.308

4

0

major

Orange

Overview of the Commission/Decommission Function for a Chassis

About Commissioning a Chassis

Commission a device to return it to a normal (commissioned) state within the EMS.

When you commission a device, Cisco UGM starts discovery on the device to resolve any card inventory changes that may have occurred while it was in the decommissioned state. When discovery is completed, the device returns to the normal or errored state depending on whether commissioning was successful.


Note   When a device is commissioned, all its subcomponents (cards and ports) also transition into the commissioned state.

About Decommissioning a Chassis

With Cisco UGM, you can decommission a device from any state. You can decommission a device due to one of these causes:

When you decommission a device, no actual changes are made to the device, which still sends traps to Cisco UGM. However, the resulting alarm events are not reported and do not initiate any actions or status changes. Presence and performance polling are also suspended, and Cisco UGM does not allow any configuration changes or software and firmware image downloads for the device.


Note   When a chassis is decommissioned, all its subcomponents (cards and ports) also transition into the decommissioned state.

Overview of the Commission/Decommission Function for a Card

About Commissioning a Card

Commission a card to return it to a normal (commissioned) state within the system.

When you commission a card, Cisco UGM reconciles its status with that of the actual card on the device. When this is completed, the card returns to either the normal or errored state. If the card was removed from the device, the corresponding card object is deleted.


Note   When a parent device is commissioned, all its subcomponents (cards and ports) also transition into the commissioned state. Likewise, when a card is commissioned, all its ports are also commissioned.

About Decommissioning a Card

You can decommission a card from any state due to one of these causes:

When you decommission a card, no actual changes are made to the card, which still sends traps to Cisco UGM. However, the resulting alarm events are not reported and do not initiate any actions or status changes.

When a parent device is decommissioned, all its subcomponents (cards and ports) also transition into the decommissioned state. Likewise, when a card is decommissioned, all its ports are also decommissioned.

Commissioning and Decommissioning a Device or Card


Step 1   Right-click the device or card object that you want to commission or decommission.

Step 2   Choose AS5xxx object> Chassis > Chassis Commissioning.

or

Choose Card object > Card Commissioning.

Step 3   Click Commission or Decommission.


Tip Decommissioned devices appear as shaded icons in the right-hand pane of the Map Viewer.


Overview of Exporting Alarm Events

With Cisco UGM, you can capture and export all alarm data to an ASCII text file; this file can then be examined locally by an external system or retrieved by an external system by using File Transfer Protocol (FTP). The external system is responsible for parsing the contents of this file.

Exporting SNMP traps consists of capturing traps from managed devices and writing them to a text file.


Note   You cannot forward internally generated Cisco UGM alarm events cannot be forwarded through SNMP; you can export these alarm events by writing them to the ASCII text file.

You can access the Alarm File Export function to schedule alarm data export, specify where the exported data is to be stored, how and when the file ages, and also specify a string to delimit exported data.

Exporting Alarm Events to a File


Step 1   From the Map viewer choose ASEMSConfig > File Export > Open File Export Properties > Alarm.

Step 2   In the Export Type field, select Continuous.

Step 3   Enter a storage path for the file.

Step 4   Select an action to be performed when file aging occurs:

Step 5   Specify the maximum size (in KBytes) of a file before the selected aging action begins. Export then continues to the newly created file.

Step 6   Specify where the file is moved to (or moveTarCompressed to) when aging occurs.

Step 7   Click Save:

Example: Alarm Data Export Format and Sample

Alarm export data is formatted as follows:

<Date>|<Time>|<DataType>|<AlarmName>|<AlarmSeverity>|<AffectedObject>|

Sample:

2000/09/08|08:32:59 EDT|InternalAlarm|communicationEstablished|normal|Physical:/Kanata/AS5 350-1| 2000/09/08|08:33:05 EDT|InternalAlarm|communicationEstablished|normal|Physical:/Kanata/AS5 400-1| 2000/09/08|08:33:06 EDT|InternalAlarm|communicationEstablished|normal|Physical:/Kanata/AS5 800-1| 2000/09/08|08:37:53 EDT|InternalAlarm|fileSysBelowMajor|normal|:/| 2000/09/08|08:37:53 EDT|InternalAlarm|fileSysBelowCritical|normal|:/| 2000/09/08|10:17:45 EDT|SNMPv1|envMonRedundantSupply|major|Physical:/Kanata/AS5800-1| 2000/09/08|10:18:41 EDT|SNMPv1|ciscoLinkUp|normal|Physical:/Kanata/AS5800-1| 2000/09/08|10:18:41 EDT|SNMPv1|ciscoLinkUp|normal|Physical:/Kanata/AS5800-1| 2000/09/10|14:36:45 EDT|SNMPv1|cardInserted|warning|Physical:/Kanata/AS5350-1| 2000/09/10|14:37:06 EDT|SNMPv1|ciscoLinkUp|normal|Physical:/Kanata/AS5350-1| 2000/09/10|14:57:28 EDT|SNMPv1|ciscoLinkUp|normal|Physical:/Kanata/AS5350-1|
2000/09/11|17:58:32 EDT|SNMPv1|ciscoLinkUp|normal|Physical:/Kanata/AS5800-1| 2000/09/11|17:58:35 EDT|SNMPv1|ciscoLinkUp|normal|Physical:/Kanata/AS5800-1| 2000/09/11|18:10:18 EDT|SNMPv1|ciscoLinkDown|major|Physical:/Kanata/AS5800-1| 2000/09/11|18:11:20 EDT|SNMPv1|ciscoLinkUp|normal|Physical:/Kanata/AS5800-1| 2000/09/11|18:15:07 EDT|InternalAlarm|entityCommissioned|informational|Physical:/Kanata/AS 5400-1| 2000/09/11|18:23:19 EDT|SNMPv1|envMonRedundantSupply|major|Physical:/Kanata/AS5800-1| 2000/09/11|18:23:59 EDT|SNMPv1|ciscoLinkUp|normal|Physical:/Kanata/AS5800-1| 2000/09/11|18:24:00 EDT|SNMPv1|ciscoLinkUp|normal|Physical:/Kanata/AS5800-1| 2000/09/12|10:20:23 EDT|SNMPv1|ciscoLinkDown|major|Physical:/Kanata/AS5800-1|

Example: Sample Configuration File for Fault Management

You can view and edit some Cisco UGM attributes by changing a configuration file in ASCII format; the file is located at:

<CEMFROOT>/config/ASMainCtrl/ASMainCtrlUserData.ini

Sample of the ASMainCtrlUserData.ini file showing items relevant to fault management in Cisco UGM:

=================================================== ; Configurable controller settings. ; =================================================== ; This section defines settings for file-system monitoring: ; * MajorThreshold : If file-system usage exceeds this percentage, ; major alarm is raised. ; * CriticalThreshold : If file-system usage exceeds this percentage, : critical alarm is raised. ; * MonitoringInterval: How often each file-system is checked in : minutes. If the value is 0, self-monitoring ; is disabled for all file-systems. ; ; - Threshold percentages must be integer values > 0 and < 100. ; - MonitoringInterval must be integer value >= 0. ; [SelfMonitor] MajorThreshold = 90 CriticalThreshold = 95 MonitoringInterval = 10

hometocprevnextglossaryfeedbacksearchhelp
Posted: Sat Sep 28 16:52:54 PDT 2002
All contents are Copyright © 1992--2002 Cisco Systems, Inc. All rights reserved.
Important Notices and Privacy Statement.