|
This chapter contains the following sections:
With the Event Browser in Cisco UGM, you can identify alarm events and take appropriate action to resolve them quickly and efficiently; in addition, you can forward user-specified SNMP traps to any configured remote host, and continuously export all alarm events, as they are raised, to a user-specified text file.
With Cisco UGM, you can decommission and commission chassis and card objects for maintenance.
All faults detected by Cisco UGM are referred to as alarm events. Faults are generated from these sources:
You can use the Event Browser to view alarm events raised against an object; various filtering criteria are provided by the Query Editor.
.
1See the "Overview of Presence Polling and Loss of Communication with a Device" section. 2See the "Overview of the Commission/Decommission Function for a Chassis" section. |
You can detect communication loss with a managed device by using presence polling. Loss of communication can occur for various reasons:
When Cisco UGM first detects loss of communication to a managed device, it does not immediately transition the device to the errored state but retries presence polling. Select the number of retries as described in the "Setting Number of Retries Before Loss of Communication" section.
Presence polling uses an interval specified in the "Setting Presence Polling Intervals for Devices in Normal and Errored States" section. If all the communication attempts prove unsuccessful, the device transitions to the errored state. An internal alarm event (communicationLost) with a Major severity level is raised against the affected device.
The default presence polling intervals are:
When communication is re-established, the device returns to a normal state, and an internal alarm event (communicationEstablished) with a Normal severity level is raised against the affected device.
If communication is restored after the duration specified in the "Setting Loss of Communication Duration" section, Cisco UGM discovers the device's subcomponents to detect any card inventory changes that may have occurred during the loss of communication.
If communication is restored within the specified duration, Cisco UGM transitions the device to the normal state.
Step 1 In Map View, choose ASEMSConfig > EMS > Settings.
Step 2 Enter the interval at which a device should be polled in the normal state.
The interval should be an integer value that is 300 or larger (representing seconds). The default is 900 seconds.
Note This value depends on the total number of managed devices in your network. You may need to change this value a few times in order to determine the optimum setting for your network. |
Step 3 Enter the interval at which a device should be polled in the errored state.
The interval should be an integer value that is 300 or larger (representing seconds). The default is 915 seconds.
Note This value depends on the total number of managed devices in your
network. Do not enter the same value as for devices in the normal state. A different value avoids overlapping polling intervals for normal and errored states. |
Step 4 Click Apply.
When Cisco UGM first detects loss of communication to a managed device, it does not immediately transition the device to the errored state, but retries presence polling by using the polling interval specified in the "Setting Presence Polling Intervals for Devices in Normal and Errored States" section. If these communication attempts are unsuccessful, the device transitions to the errored state.
Step 1 In Map View, select ASEMSConfig > EMS > Settings.
Step 2 Enter the number of times Cisco UGM tries to re-establish connectivity before transitioning the device into the errored state.
The number entered should be an integer value that is 0 or larger. A value of 0 disables retries; the default is 1.
Note A large value causes a delay before loss of communication with a device is detected. |
Step 3 Click Apply.
Step 1 In Map View, choose ASEMSConfig > EMS > Settings.
Step 2 Enter a time interval for which communication must be lost in order to start discovery.
The interval should be an integer value that is 15 or larger (representing minutes). The default is 15 minutes.
Note A large value results in card inventory changes that are not detected. |
If communication is restored after this interval, Cisco UGM initiates discovery of the device's subcomponents to detect any card inventory changes that may have occurred during the loss of communication.
If communication is restored within this interval, Cisco UGM transitions the device to the normal state.
Step 3 Click Apply.
You can start the Event Browser from the Launchpad or from the pop-up menu for the individual object within Map Viewer.
With the Event Browser, you can perform these tasks:
Note Only the most severe alarm event against an object appears next to its icon within Map Viewer. |
You can view additional alarm details by using the Event browser. For more information, refer to the Cisco Element Management Framework User's Guide.
Step 1 In the Map Viewer, note the color coding of status dots to represent the occurrence of alarm events against the objects.
See the "Overview of Alarm Events" section for an explanation of the colors.
Step 2 Right-click the object whose list of alarm events you want to view and choose Tools > Open Event Browser.
If you do not want to view all events in the system, set up a query by using the Query Editor to view only specific events.
The criteria that you use to specify a query are on individual tabs. The Event Browser is updated with only those events that match the query criteria. A progress bar indicates that Cisco UGM is querying events and the window is being updated.
Caution Any changes that you make to a query are not stored when you exit the Event Browser. |
If you have specified different queries, you can open more than one Event Browser session at a time.
For details about the Query Editor refer to the Cisco Element Manager Framework User's Guide.
To access the Query Editor from the Event Browser, choose Edit > Query Setup.
In the Map Viewer tree, you can see raised alarm events by the presence of colored dots next to tree objects in the left pane and by colored annotations against the object icons in the right pane.
The dots are color coded to reflect the following severity levels (highest to lowest): critical, major, minor, informational, and normal.
The defined color coding is:
A device or card object can be in either commissioned or decommissioned state within Cisco UGM.
If an object is in a commissioned state, alarm events against that object are propagated to the physical tree in the Map Viewer and appear in the parent objects to the region level.
For decommissioned objects, alarm events are not propagated up to the physical tree in the Map Viewer.
For details on commissioning and decommissioning objects, see the "Overview of the Commission/Decommission Function for a Chassis" section.
The following table describes Cisco UGM alarm events, their severity, explanation, and recovery procedures.
Alarm Event | Alarm Severity | Explanation |
---|---|---|
Warning | You started the device object from a power-off state. Note Clear this event manually. | |
Warning | You restarted the device object from an on state. Note Clear this event manually. | |
Major | A DS1 or Ethernet interface is down. | |
Normal | A DS1 or Ethernet interface is up. | |
Major | The device received an SNMP message that was improperly authenticated. | |
Warning | You inserted a new card in the device; | |
Warning | You removed a card from the device; | |
Informational | You inserted a new card in the device; | |
Informational | You removed a card from the device; | |
Critical | A critical environmental condition is detected and a device shutdown is imminent. | |
Major | A voltage threshold was exceeded on the device. | |
Major | A temperature threshold was exceeded on the device. | |
Major | The fan on the device has failed. | |
Major | The power supply on the device has failed. | |
Major | Cisco UGM lost SNMP connectivity with the device. | |
Normal | Cisco UGM established SNMP connectivity with the device. | |
Informational | Device or card object has been decommissioned. | |
Informational | Device or card object has been commissioned. | |
Major | Server disk usage is over the user-defined major threshold1. | |
Critical | Server disk usage is over the user-defined critical threshold2. | |
Normal | Server disk usage is below the user-defined major threshold. | |
Normal | Server disk usage is below the user-defined critical threshold. | |
Major | During a Graceful Shutdown operation, loss of communication with the device occurred or it was decommissioned. Note Clear this event manually. | |
Major | During an Accept Traffic operation, loss of communication with the device occurred or it was decommissioned. Note Clear this event manually. |
1For details on changing this threshold, see the "Example: Sample Configuration File for Fault Management" section. 2For details on changing this threshold, see the "Example: Sample Configuration File for Fault Management" section. |
If you manually clear an alarm event for an object in the Event Browser, that object appears in the Map Viewer with an alarm notification reflecting the next highest alarm present for that object. This change in alarm severity appears in the Map Viewer, even if the fault condition has not actually been corrected.
Cisco UGM does not generate all alarm events again, even if the alarm conditions are still present; therefore, be cautious in clearing alarm events.
Step 1 In the Map Viewer, note the color coding of status dots to represent the occurrence of alarm events against the objects.
See the "Overview of Alarm Events" section.
Step 2 Right-click the object whose list of alarm events you want to view and choose Tools > Open Event Browser.
You can acknowledge and clear individual alarm events by clicking the appropriate box next to each event.
By using the Trap Forwarding Deployment Wizard, you can:
Note The default is no trap forwarding. |
Step 1 Choose ASEMSConfig > TrapForwarding > Deploy Trap Forwarding Hosts.
Step 2 Follow the instructions provided by the Deployment wizard.
Step 3 In the Map viewer window, choose ASEMSConfig > Trap Forwarding > Trap Forwarding Properties.
Step 4 To enable trap forwarding, click Accept Saved Setting.
Step 1 From the Map Viewer, open ASEMSConfig.
Step 2 Expand the Trap Forwarding tree by clicking on the + (plus) sign.
Step 3 Open the Trap Specifiers Deployment Wizard.
Step 4 Right-click the host destination for which you wish to add a new trap specifier and select Deploy Trap Specifiers.
Step 5 Follow the instructions provided by the Deployment wizard.
Step 6 In the Map Viewer, choose ASEMSConfig > Trap Forwarding > Trap Forwarding Properties.
Step 7 To update trap forwarding, click Accept Saved Setting.
The trap forwarding action triggered reflects any changes made (and saved) in this dialog box. Any previously specified trap forwarding action is replaced.
Step 1 In the Map Viewer, choose ASEMSConfig > Trap Forwarding > Trap Forwarding Properties.
Step 2 Enter your changes.
Step 3 Click the Save icon from the dialog toolbar, or choose File > Save.
Step 4 To update trap forwarding, click Accept Saved Setting.
The trap forwarding action triggered reflects any changes made (and saved) in this dialog. Any previously specified trap forwarding action is replaced.
Step 1 From the Map Viewer, open ASEMSConfig.
Step 2 Expand the Trap Forwarding tree by clicking the + (plus) sign.
Step 3 Expand any listed host destination by clicking the + (plus) sign.
Step 4 Right-click the object to be deleted (a host destination, or a specific trap specifier for a given host destination) and choose Deployment > Delete Objects.
Step 5 In the Map Viewer, choose ASEMSConfig > Trap Forwarding > Trap Forwarding Properties.
Step 6 To update trap forwarding, click Accept Saved Setting.
The trap forwarding action triggered reflects any changes made (and saved) in this dialog. Any previously specified trap forwarding action is replaced.
Tip To deactivate or disable all trap forwarding, you must delete all host destinations and click Accept Saved Setting. To resume trap forwarding, re-enter the host destinations. |
See the "Specifying New Trap Forwarding Hosts" section.
Class Mapping | Enterprise | Generic ID | Specific ID | Severity | Color |
---|---|---|---|---|---|
ciscoColdStart | 1.3.6.1.4.1.9.1.313 | 0 | 0 | warning | Cyan |
ciscoWarmStart | 1.3.6.1.4.1.9.1.313 | 1 | 0 | warning | Cyan |
ciscoLinkDown | 1.3.6.1.4.1.9.1.313 | 2 | 0 | major | Orange |
ciscoLinkUp | 1.3.6.1.4.1.9.1.313 | 3 | 0 | normal | Green |
ciscoAuthenticationFailure | 1.3.6.1.4.1.9.1.313 | 4 | 0 | major | Orange |
Class Mapping | Enterprise | Generic ID | Specific ID | Severity | Color |
---|---|---|---|---|---|
ciscoColdStart | 1.3.6.1.4.1.9.1.274 | 0 | 0 | warning | Cyan |
ciscoWarmStart | 1.3.6.1.4.1.9.1.274 | 1 | 0 | warning | Cyan |
ciscoLinkDown | 1.3.6.1.4.1.9.1.274 | 2 | 0 | major | Orange |
ciscoLinkUp | 1.3.6.1.4.1.9.1.274 | 3 | 0 | normal | Green |
ciscoAuthenticationFailure | 1.3.6.1.4.1.9.1.274 | 4 | 0 | major | Orange |
Class Mapping | Enterprise | Generic ID | Specific ID | Severity | Color |
---|---|---|---|---|---|
ciscoColdStart | 1.3.6.1.4.1.9.1.188 | 0 | 0 | warning | Cyan |
ciscoWarmStart | 1.3.6.1.4.1.9.1.188 | 1 | 0 | warning | Cyan |
ciscoLinkDown | 1.3.6.1.4.1.9.1.188 | 2 | 0 | major | Orange |
ciscoLinkUp | 1.3.6.1.4.1.9.1.188 | 3 | 0 | normal | Green |
ciscoAuthenticationFailure | 1.3.6.1.4.1.9.1.188 | 4 | 0 | major | Orange |
Class Mapping | Enterprise | Generic ID | Specific ID | Severity | Color |
---|---|---|---|---|---|
ciscoColdStart | 1.3.6.1.4.1.9.1.308 | 0 | 0 | warning | Cyan |
ciscoWarmStart | 1.3.6.1.4.1.9.1.308 | 1 | 0 | warning | Cyan |
ciscoLinkDown | 1.3.6.1.4.1.9.1.308 | 2 | 0 | major | Orange |
ciscoLinkUp | 1.3.6.1.4.1.9.1.308 | 3 | 0 | normal | Green |
ciscoAuthenticationFailure | 1.3.6.1.4.1.9.1.308 | 4 | 0 | major | Orange |
Commission a device to return it to a normal (commissioned) state within the EMS.
When you commission a device, Cisco UGM starts discovery on the device to resolve any card inventory changes that may have occurred while it was in the decommissioned state. When discovery is completed, the device returns to the normal or errored state depending on whether commissioning was successful.
Note When a device is commissioned, all its subcomponents (cards and ports) also transition into the commissioned state. |
With Cisco UGM, you can decommission a device from any state. You can decommission a device due to one of these causes:
When you decommission a device, no actual changes are made to the device, which still sends traps to Cisco UGM. However, the resulting alarm events are not reported and do not initiate any actions or status changes. Presence and performance polling are also suspended, and Cisco UGM does not allow any configuration changes or software and firmware image downloads for the device.
Note When a chassis is decommissioned, all its subcomponents (cards and ports) also transition into the decommissioned state. |
Commission a card to return it to a normal (commissioned) state within the system.
When you commission a card, Cisco UGM reconciles its status with that of the actual card on the device. When this is completed, the card returns to either the normal or errored state. If the card was removed from the device, the corresponding card object is deleted.
Note When a parent device is commissioned, all its subcomponents (cards and ports) also transition into the commissioned state. Likewise, when a card is commissioned, all its ports are also commissioned. |
You can decommission a card from any state due to one of these causes:
When you decommission a card, no actual changes are made to the card, which still sends traps to Cisco UGM. However, the resulting alarm events are not reported and do not initiate any actions or status changes.
When a parent device is decommissioned, all its subcomponents (cards and ports) also transition into the decommissioned state. Likewise, when a card is decommissioned, all its ports are also decommissioned.
Step 1 Right-click the device or card object that you want to commission or decommission.
Step 2 Choose AS5xxx object> Chassis > Chassis Commissioning.
or
Choose Card object > Card Commissioning.
Step 3 Click Commission or Decommission.
Tip Decommissioned devices appear as shaded icons in the right-hand pane of the Map Viewer. |
With Cisco UGM, you can capture and export all alarm data to an ASCII text file; this file can then be examined locally by an external system or retrieved by an external system by using File Transfer Protocol (FTP). The external system is responsible for parsing the contents of this file.
Exporting SNMP traps consists of capturing traps from managed devices and writing them to a text file.
Note You cannot forward internally generated Cisco UGM alarm events cannot be forwarded through SNMP; you can export these alarm events by writing them to the ASCII text file. |
You can access the Alarm File Export function to schedule alarm data export, specify where the exported data is to be stored, how and when the file ages, and also specify a string to delimit exported data.
Step 1 From the Map viewer choose ASEMSConfig > File Export > Open File Export Properties > Alarm.
Step 2 In the Export Type field, select Continuous.
Step 3 Enter a storage path for the file.
Step 4 Select an action to be performed when file aging occurs:
Step 5 Specify the maximum size (in KBytes) of a file before the selected aging action begins. Export then continues to the newly created file.
Step 6 Specify where the file is moved to (or moveTarCompressed to) when aging occurs.
Step 7 Click Save:
Alarm export data is formatted as follows:
<Date>|<Time>|<DataType>|<AlarmName>|<AlarmSeverity>|<AffectedObject>|
Sample:
2000/09/08|08:32:59
EDT|InternalAlarm|communicationEstablished|normal|Physical:/Kanata/AS5
350-1|
2000/09/08|08:33:05
EDT|InternalAlarm|communicationEstablished|normal|Physical:/Kanata/AS5
400-1|
2000/09/08|08:33:06
EDT|InternalAlarm|communicationEstablished|normal|Physical:/Kanata/AS5
800-1|
2000/09/08|08:37:53 EDT|InternalAlarm|fileSysBelowMajor|normal|:/|
2000/09/08|08:37:53 EDT|InternalAlarm|fileSysBelowCritical|normal|:/|
2000/09/08|10:17:45
EDT|SNMPv1|envMonRedundantSupply|major|Physical:/Kanata/AS5800-1|
2000/09/08|10:18:41
EDT|SNMPv1|ciscoLinkUp|normal|Physical:/Kanata/AS5800-1|
2000/09/08|10:18:41
EDT|SNMPv1|ciscoLinkUp|normal|Physical:/Kanata/AS5800-1|
2000/09/10|14:36:45
EDT|SNMPv1|cardInserted|warning|Physical:/Kanata/AS5350-1|
2000/09/10|14:37:06
EDT|SNMPv1|ciscoLinkUp|normal|Physical:/Kanata/AS5350-1|
2000/09/10|14:57:28
EDT|SNMPv1|ciscoLinkUp|normal|Physical:/Kanata/AS5350-1|
2000/09/11|17:58:32
EDT|SNMPv1|ciscoLinkUp|normal|Physical:/Kanata/AS5800-1|
2000/09/11|17:58:35
EDT|SNMPv1|ciscoLinkUp|normal|Physical:/Kanata/AS5800-1|
2000/09/11|18:10:18
EDT|SNMPv1|ciscoLinkDown|major|Physical:/Kanata/AS5800-1|
2000/09/11|18:11:20
EDT|SNMPv1|ciscoLinkUp|normal|Physical:/Kanata/AS5800-1|
2000/09/11|18:15:07
EDT|InternalAlarm|entityCommissioned|informational|Physical:/Kanata/AS
5400-1|
2000/09/11|18:23:19
EDT|SNMPv1|envMonRedundantSupply|major|Physical:/Kanata/AS5800-1|
2000/09/11|18:23:59
EDT|SNMPv1|ciscoLinkUp|normal|Physical:/Kanata/AS5800-1|
2000/09/11|18:24:00
EDT|SNMPv1|ciscoLinkUp|normal|Physical:/Kanata/AS5800-1|
2000/09/12|10:20:23
EDT|SNMPv1|ciscoLinkDown|major|Physical:/Kanata/AS5800-1|
You can view and edit some Cisco UGM attributes by changing a configuration file in ASCII format; the file is located at:
<CEMFROOT>/config/ASMainCtrl/ASMainCtrlUserData.ini
Sample of the ASMainCtrlUserData.ini file showing items relevant to fault management in Cisco UGM:
===================================================
; Configurable controller settings.
; ===================================================
; This section defines settings for file-system monitoring:
; * MajorThreshold : If file-system usage exceeds this percentage,
; major alarm is raised.
; * CriticalThreshold : If file-system usage exceeds this percentage,
: critical alarm is raised.
; * MonitoringInterval: How often each file-system is checked in
: minutes. If the value is 0, self-monitoring
; is disabled for all file-systems.
;
; - Threshold percentages must be integer values > 0 and < 100.
; - MonitoringInterval must be integer value >= 0.
;
[SelfMonitor]
MajorThreshold = 90
CriticalThreshold = 95
MonitoringInterval = 10
Posted: Sat Sep 28 16:52:54 PDT 2002
All contents are Copyright © 1992--2002 Cisco Systems, Inc. All rights reserved.
Important Notices and Privacy Statement.