cc/td/doc/product/access/acs_soft/rpms/rpms_2-0
hometocprevnextglossaryfeedbacksearchhelp
PDF

Table of Contents

Fault Tolerance

Fault Tolerance

Topics in this chapter include:

Overview: Fault Tolerance

Cisco RPMS allows you to build fault tolerance and resiliency into your dial service offerings. Fault tolerance and resiliency consist of the following features:

Hot Standby

Cisco RPMS provides the hot standby (or high availability) feature to protect against server failure. Any stateful Cisco RPMS component server, such as policy processors, can be deployed as a high availability (HA) pair. In this configuration, the servers continually synchronize with each other by exchanging messages.

Providing an HA pair means the active call counts and other network states are mirrored across both servers. If either server fails, the system continues to operate with the remaining server without losing state.

Both servers in a hot standby pair act as peers. There is no primary or backup server. So both servers are fully functional and independently capable of handling all the network traffic. Because of this, you can provision a network so that some devices (such as RASERs) communicate with one server, while others communicate with its peer. This type of provisioning can help with network load sharing.

Both servers in a hot standby pair replicate their state to the peer by exchanging messages over a reliable, TCP link, so that both servers receive the combined network traffic meant for the pair. The servers make their policy decisions based on the total network activity. For this reason, network load sharing does not mean less load or lower CPU utilization for the servers participating in hot standby configuration.

When installing a new server installation or after rebooting it, you can synchronize it with its peer at any time by using a CLI command. You can also configure HA servers to automatically synchronize with their peer at startup.

For more specific configuration information, refer to "Configuring Cisco RPMS Fault Tolerance" in the Cisco Resource Policy Management System Configuration Guide.

Tolerance to Database Failures

Cisco RPMS needs database connectivity for configuration purposes only; database connectivity is not required for call processing. So, if the Oracle database fails, the Cisco RPMS still continues to accept UG requests.

When Cisco RPMS detects a database failure, it generates an e-mail. Additionally, an SNMP trap is generated when the connectivity to the database fails and when it is restored.

Cisco RPMS Autorestart

Cisco RPMS can detect server process failures and can automatically restart any Cisco RPMS processes that failed. The following components are monitored:

Detection of Universal Gateway Failures

Cisco RPMS implements a heartbeat checker mechanism that allows Cisco RPMS to test whether or not a UG is still active.

You can configure Cisco RPMS to use SNMP to automatically poll the UGs that are in the UG list. When polling, Cisco RPMS sends an SNMP Get request to each UG. If an SNMP agent is running on the UG, a Get request returns a message with either a time in hundredths of seconds that the encapsulated agent has been running, or a "no such name" error message, which signifies that the agent and the UG are alive, regardless of the returned value. However, if a UG does not respond to the request, then Cisco RPMS resets all the corresponding active calls.

For information on configuring the heartbeat checker or for the heartbeat configuration definitions, refer to the "Overview: The Universal Gateway Heartbeat" section in the Cisco Resource Policy Management System Configuration Guide.

Tolerance to AAA Server Failure

To enhance fault tolerance to AAA server failures, Cisco RPMS allows you to create a prioritized list of AAA servers. Cisco RPMS RASERs use this list to determine the destination of authorization and accounting messages received from the UG.

The RASERs forward messages to the AAA server with highest priority. If the RASERs detect that this AAA server has failed, they switch over to the server with the next highest priority. When the RASERs reach the end of this list, they continue from the top of the list again.

For more specific configuration information, refer to "Configuring Cisco RPMS Fault Tolerance" in the Cisco Resource Policy Management System Configuration Guide.


hometocprevnextglossaryfeedbacksearchhelp
Posted: Mon Sep 9 13:48:20 PDT 2002
All contents are Copyright © 1992--2002 Cisco Systems, Inc. All rights reserved.
Important Notices and Privacy Statement.