Redundancy and Fail-Over

home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco

Table Of Contents

Redundancy and Fail-Over

Information About Redundancy and Fail-Over

Terminology and Definitions

Redundant Topologies

In-line Dual Link Redundant Topology

Failure Detection

How to Configure Forced Failure

How to Force a Virtual Failure Condition

How to Exit a Virtual Failure Condition

Hot Standby and Fail-over

Hot Standby

Fail-over

Failure in the Cascade Connection

Installing a Cascaded System

Recovery

Replacing the SCE platform (manual recovery)

Reboot only (fully automatic recovery)

CLI Commands for Cascaded Systems

Topology-Related Parameters for Redundant Topologies

Configuring the Connection Mode

How to Monitor the System

System Upgrades

Firmware Upgrade (package installation)

Application Upgrade

Simultaneous Upgrade of Firmware and Application

Redundancy and Fail-Over

• Information About Redundancy and Fail-Over

• How to Configure Forced Failure

• Hot Standby and Fail-over

• Recovery

• CLI Commands for Cascaded Systems

• System Upgrades

Information About Redundancy and Fail-Over

• Terminology and Definitions

• Redundant Topologies

• In-line Dual Link Redundant Topology

• Failure Detection

This module presents the fail-over and redundancy capabilities of the SCE platform. It first defines relevant terminology, as well as pertinent theoretical aspects of the redundancy and fail-over solution. It then explains specific recovery procedures for both single and dual link topologies. It also explains specific update procedures to be used in a cascaded SCE platform deployments. When fail over is required in a deployment, a topology with two cascaded SCE platforms is used. This cascaded solution provides both network link fail over, and fail over of the functionality of the SCE platform, including updated subscriber state.

Note The information in this chapter applies to the SCE 2000 4xGBE and SCE 2000 4/8xFE platforms only.

Terminology and Definitions

Following is a list of definitions of terms used in the chapter as they apply to the Cisco fail-over solution, which is based on cascaded SCE platforms.

•Fail-over — A situation in which the SCE platform experiences a problem that makes it impossible for it to provide its normal functionality, and a second SCE platform device immediately takes over for the failed SCE platform.

•Hot standby — When two SCE platforms are deployed in a fail over topology, one SCE platform is active, while the second SCE platform is in standby, receiving from the active SCE platform all subscriber state updates and keep alive messages.

•Primary/Secondary — The terms Primary and Secondary refer to the default status of a particular SCE platform. The Primary SCE Platform is active by default, while the Secondary device is the default standby. Note that these defaults apply only when both devices are started together. However, if the primary SCE platform fails and then recovers, it will not revert to active status, but remains in standby status, while the secondary device remains active.

•Subscriber state fail-over — A fail over solution in which subscriber state is saved.

Redundant Topologies

All Cisco SCE platforms include an internal electrical bypass module, which provide the capability of preserving the network link in case the SCE platform fails. The SCE platform, which can handle two data links, includes two such bypass modules. However, in some cases, the service provider wishes to preserve the SCE platform functionality in case of a failure, in addition to preserving the network link.

Cisco provides a unique solution for this scenario, through deploying two cascaded SCE platforms on these two data links.

The cascading is implemented by connecting the two SCE platforms using two of the data links. This fail over solution applies to both inline and receive-only topologies.

In each SCE platform, two of the four data interfaces are connected to each of the network links, while the other two data interfaces are used for cascading between the SCE platforms. (See the Cisco SCE 2000 Installation and Configuration Guide for specific cabling procedures for redundant topologies.) The cascade ports are used for transferring network traffic, keep-alive messages and subscriber state updates.

In-line Dual Link Redundant Topology

This topology serves inline deployments where the SCE platform functionality should be preserved in case of a failure, in addition to preserving the network link.

Figure 10-1 In-line Dual Link Redundant Topology

Failure Detection

• Link Failure Reflection

The SCE platform has several types of mechanisms for detecting failures:

•Internal failure detection — The SCE platform monitors for hardware and software conditions such as overheating and fatal software errors.

•Inter-device failure detection — The SCE platform sends periodic keep-alive messages via the cascade ports

•SCE platform-Subscriber Manager (SM) communication failure detection — A failure to communicate with the SM may be regarded as a cause for fail over. However, this communication failure is not necessarily a problem in the SCE platform. If the connection to the SM of the active SCE platform has failed, while the connection to the SM of the standby SCE platform is alive, a fail over process will be initiated to allow the SCE platform proper exchange of information between the SCE platforms and the SM.

•Link failure — The system monitors all three types of links for failures:

–Traffic port link failure — Traffic cannot flow through the SCE platform.

–Cascade port link failure — Traffic cannot flow between the SCE platforms through the cascade ports.

–Management port link failure — This is not a failure that interrupts traffic on the link in and of itself. However, when SM is used, management port link failure will cause an SM connection failure and this, in turn, will be declared as a failure of the SCE platform.

This type of failure, in most cases, does not require reboot of the SCE platform. When the connection with the SM is re-established the SCE platform is again ready for hot standby. If both SCE platforms lose their connections with the SM, it is assumed that it is the SM which has failed, thus, no action will be taken in the SCE platform.

Link Failure Reflection

The SCE platforms are transparent at Layers 2 and 3. The SCE platform operates in promiscuous mode, and the network elements on both sides of the SCE platform, are using the MAC address of the other network element when forwarding traffic.

To assist the network elements on both sides of the SCE platform to identify the link failures as quickly as possible, the SCE platform supports a functionality of reflecting to the other side of the SCE platforms events of link failure. When the link on one side of the SCE platform fails, the corresponding link on the other side is forced down, to reflect the failure. Link failure reflection is done on the traffic ports. When operating in deployments of single SCE platform with two data links, link failure is reflected between the two ports of each link.

When working with two cascaded SCE platforms, link failure is reflected in two cases:

•Reflection between the traffic ports of each SCE platform.

•If there is a failure in the cascade port link, the two SCE platforms can no longer support proper processing of the two links, since the traffic flowing on the standby SCE platform's link must be forwarded to the active SCE platform for processing. In this case the link failure is reflected from the cascade ports to the traffic ports of the standby SCE platform, in order to force the network to switch all the traffic only through the link of the active SCE platform.

Link failure reflection is supported both when the SCE platform is operational and when it is in failure/boot status.

Link reflection, like fail-over, is dependent on the bypass mechanism of the SCE platform

How to Configure Forced Failure

Use the following commands to force a virtual failure condition, and to exit from the failure condition when performing an application upgrade. (See How to Manage Application Files.)

• How to Force a Virtual Failure Condition

• How to Exit a Virtual Failure Condition

How to Force a Virtual Failure Condition

Step 1 From the SCE(config if)# prompt, type force failure-condition and press Enter.

Forces the SCE platform into a virtual failure state.

How to Exit a Virtual Failure Condition

Step 1 From the SCE(config if)# prompt, type no force failure-condition and press Enter.

Exits from the virtual failure state.

Hot Standby and Fail-over

The fail over solution requires two SCE platforms connected in a cascade manner.

• Hot Standby

• Fail-over

• Failure in the Cascade Connection

• Installing a Cascaded System

Hot Standby

In fail over solution, one of the SCE platforms is used as the active SCE platform and the other is used as the standby. Although traffic enters both the active and the standby SCE platforms, all traffic processing takes place in the SCE platform which is currently the active one. The active SCE platform processes the traffic coming on both links, its own link and the link connected to the standby SCE platform, as follows

•All traffic entering the active SCE platform through its traffic ports is processed in that SCE platform and then forwarded to the line.

•All traffic entering the standby SCE platform through its traffic ports is forwarded through the cascade ports to the active SCE platform where it is processed, and then returned to the standby SCE platform through the cascade ports to be forwarded to the original line from which it came.

Since only one SCE platform processes all traffic at any given time, split flows, which are caused by asymmetrical routing, that exist in the two data links are handled correctly.

To support subscriber-state fail-over, both SCE platforms hold subscriber states for all parties, and subscriber state updates are exchanged between the active SCE platform and the standby. This way, if the active SCE platform fails, the standby SCE platform is able to start serving the line immediately with a minimum loss of subscriber-state.

The two SCE platforms also use the cascade channel for exchanging periodic keep-alive messages.

Fail-over

In fail over solution, the two SCE platforms exchange keep alive messages via the cascade ports. This keep alive mechanism enables fast detection of failures between the SCE platforms and fast fail over to the standby SCE platform when required.

If the active SCE platform fails, the standby SCE platform then assumes the role of the active SCE platform.

The failed SCE platform uses its electrical bypass mechanism, which is a hardware entity that is separate from the main board and processors, to forward traffic to the other SCE platform, and to forward processed traffic back to the link. The previously standby SCE platform now processes all the traffic of this other link that is forwarded to it by the previously active SCE platform in addition to the traffic of its own link.

When the failed SCE platform recovers, it will remain in standby, while the previously standby SCE platform remains active. Switching the SCE platforms back to their original roles may be performed manually, if required, after the failed SCE platform has either recovered or been replaced.

If the failure is in the standby SCE platform, it will continue to forward traffic to the active SCE platform and back to its link, while the active SCE platform continues to provide its normal processing functionality to the traffic of the two links.

Note For information regarding the synchronization of subscriber information between cascaded SCE platforms and the effect of fail-over on the subscriber databases, see Synchronizing Subscriber Information in a Cascade System.

There are two user-configurable options that are relevant in a situation when an SCE platform fails:

•Bypass — Maintain the link in bypass mode (continue sending traffic to the other SCE platform, and then continue forwarding the processed traffic back to the link). The incoming traffic in the failed SCE platform is forwarded to the working SCE platform, where it is processed and then sent back to the original SCE platform and back to the link.

–Effect on the network link — negligible.

–Effect on the SCE platform functionality — The effect on the SCE platform functionality is dependent on the failed SCE platform.

–If the failure is in the standby SCE platform — the active SCE platform continues providing its normal functionality, processing the traffic of the two links.

–If the failure is in the active SCE platform — the standby SCE platform takes over processing the traffic, and becomes the active SCE platform.

•Cutoff — Change the link of the failed SCE platform to cutoff (layer 1) forcing the network to switch all traffic through the line of the working SCE platform. This will, of course, decrease the network capacity by 50%, but may be useful in some cases.

–Effect on the network — The network loses 50% of its capacity (until the failed SCE platform has recovered).

–Effect on the SCE platform functionality — The effect on the SCE platform functionality is dependent on the failed SCE platform:

–If the failure is in the standby SCE platform — SCE platform continues providing its normal functionality, processing the traffic of its own link.

–If the failure is in the active SCE platform — the standby SCE platform takes over processing the traffic, and becomes the active SCE platform. This option is available for use in special cases, and requires specific configuration.

Failure in the Cascade Connection

The effect of a failure in the cascade connection between the two SCE platforms depends on whether one or both connections fail:

•Only one cascade connection is down — In this case, both SCE platforms can still communicate, so each still knows the status of the peer.

As long as one cascade connection remains up, the standby will cut off its traffic links so that all traffic is routed via the active SCE platform. Therefore, split flow is avoided, but at the expense of half line capacity.

•Both cascade links are down — In this case, neither SCE platform knows anything about the status of the peer. Each platform then works in standalone mode, which means that each SCE platform processes on its own traffic, only. This results in split flows.

Installing a Cascaded System

This section outlines the installation procedures for a redundant solution with two cascaded SCE platforms.

Refer to the Cisco SCE 2000 Installation and Configuration Guide for information on topologies and connections.

Refer to the Cisco Service Control Engine (SCE) CLI Command Reference for details of the CLI commands.

Note When working with two SCE platforms with split-flow and redundancy, it is extremely important to follow this installation procedure.

SUMMARY STEPS

1. Install both SCE platforms, power them up, and perform the initial system configuration.

2. Connect both SCE platforms to the management station.

3. Connect the cascade ports. The cascade ports must be connected directly in Layer 1 (dark fibers), not through a switch.

4. Set topology configurations for each SCE platform via the connection-mode options. (See Topology-Related Parameters for Redundant Topologies )

5. Make sure that the SCE platforms have synchronized and active SCE platform was selected. Use the show interface linecard 0 connection-mode command.

6. If you want to start with bypass/sniffing, change the link mode to your required mode in both SCE platforms on both links. The bypass mode will be applied only to the active SCE platform. (See About the Link Mode.)

7. Make sure that the link mode is as you required. (See How to Monitor the System.) Use the show interface linecard 0 link modecommand.

8. Connect the traffic port of SCE platform #1. This will cause a momentary down time until the network elements from both sides of the SCE platform auto-negotiate with it and start working (when working inline).

9. Connect the traffic port of SCE platform #2, this will cause a momentary down time until the network elements from both sides of the SCE platform auto-negotiate with it and start working (when working inline).

10. When full control is needed, change the link mode on both SCE platforms on both links to `forwarding'. It is recommended to first configure the active SCE platform and then the standby. (See About the Link Mode.)

11. You can now start working with the Subscriber Manager.

DETAILED STEPS

Step 1 Install both SCE platforms, power them up, and perform the initial system configuration.

Step 2 Connect both SCE platforms to the management station.

Step 3 Connect the cascade ports. The cascade ports must be connected directly in Layer 1 (dark fibers), not through a switch.

Step 4 Set topology configurations for each SCE platform via the connection-mode options. (See Topology-Related Parameters for Redundant Topologies )

Step 5 Make sure that the SCE platforms have synchronized and active SCE platform was selected. Use the show interface linecard 0 connection-mode command.

Step 6 If you want to start with bypass/sniffing, change the link mode to your required mode in both SCE platforms on both links. The bypass mode will be applied only to the active SCE platform. (See About the Link Mode.)

Step 7 Make sure that the link mode is as you required. (See How to Monitor the System.) Use the show interface linecard 0 link modecommand.

Step 8 Connect the traffic port of SCE platform #1. This will cause a momentary down time until the network elements from both sides of the SCE platform auto-negotiate with it and start working (when working inline).

Step 9 Connect the traffic port of SCE platform #2, this will cause a momentary down time until the network elements from both sides of the SCE platform auto-negotiate with it and start working (when working inline).

Step 10 When full control is needed, change the link mode on both SCE platforms on both links to `forwarding'. It is recommended to first configure the active SCE platform and then the standby. (See About the Link Mode.)

Step 11 You can now start working with the Subscriber Manager.

Recovery

• Replacing the SCE platform (manual recovery)

• Reboot only (fully automatic recovery)

This section specifies the procedure for recovery after a failure. The purpose of the recovery procedure is to restore the system to fully functional status. After the recovery procedure, the behavior of the system is the same as after installation.

A failed SCE platform may either recover automatically or be replaced (manual recovery). Whether recovery is automatic or manual depends on the original cause of the failure:

•Power failure — manual or automatic recovery can be implemented.

•Any failure resulting in a reboot — manual or automatic recovery can be implemented (this is configurable).

•3-consecutive reboots within half an hour — manual recovery only

•Cascade ports link-failure — automatic recovery when link revives.

•Traffic link failure — automatic recovery when link revives.

•Failure in the communications with the SM — automatic by SM decisions after connection is re-established.

•Hardware malfunction — manual recovery, after replacing the malfunctioning SCE platform.

Replacing the SCE platform (manual recovery)

This is done in two stages, first manual installation steps performed by the technician, and then automatic configuration steps performed by the system.

• Manual steps:

• Automatic steps (in parallel with the manual steps, requires no user intervention):

Manual steps:

SUMMARY STEPS

1. Disconnect the failed SCE platform from the network

2. Connect a new SCE platform to the management link and the cascade links (leave network ports disconnected.)

3. Configure the SCE platform.

4. Basic network configurations done manually (first time).

5. Load application software ( Service Control Application for Broadband ) to the SCE platform. Provide application configuration.

6. Connect the traffic ports to the network links.

DETAILED STEPS

Step 1 Disconnect the failed SCE platform from the network

Step 2 Connect a new SCE platform to the management link and the cascade links (leave network ports disconnected.)

Step 3 Configure the SCE platform.

Step 4 Basic network configurations done manually (first time).

Step 5 Load application software ( Service Control Application for Broadband ) to the SCE platform. Provide application configuration.

Step 6 Connect the traffic ports to the network links.

Automatic steps (in parallel with the manual steps, requires no user intervention):

SUMMARY STEPS

1. Establishment of inter-SCE platform communication.

2. Synchronization with the SM.

3. Copying updated subscriber states from the active SCE platform to the standby.

DETAILED STEPS

Step 1 Establishment of inter-SCE platform communication.

Step 2 Synchronization with the SM.

Step 3 Copying updated subscriber states from the active SCE platform to the standby.

Reboot only (fully automatic recovery)

SUMMARY STEPS

1. Reboot of the SCE platform.

2. Basic network configurations.

3. Establishment of inter-SCE platform communication.

4. Selection of the active SCE platform.

5. Synchronization of the recovered SCE platform with the SM.

6. Copying updated subscriber states from the active SCE platform to the standby.

DETAILED STEPS

Step 1 Reboot of the SCE platform.

Step 2 Basic network configurations.

Step 3 Establishment of inter-SCE platform communication.

Step 4 Selection of the active SCE platform.

Step 5 Synchronization of the recovered SCE platform with the SM.

Step 6 Copying updated subscriber states from the active SCE platform to the standby.

CLI Commands for Cascaded Systems

• Topology-Related Parameters for Redundant Topologies

• Configuring the Connection Mode

• How to Monitor the System

This section presents CLI commands relevant to the configuration and monitoring of a redundant system.

Use the following commands to configure and monitor a redundant system:

•connection-mode

•[no] force failure-condition

•show interface linecard 'number' connection-mode

•show interface linecard 'number' physically-connected links

Topology-Related Parameters for Redundant Topologies

All four of the topology-related parameters are required when configuring a redundant topology.

•Connection mode — Redundancy is achieved by cascading two SCE platforms. Therefore the connection mode for both SCE platforms may be either:

–Inline-cascade

–Receive-only-cascade

•Physically-connected-links — For each of the cascaded SCE platforms, this parameter defines the number of the link (Link 0 or Link 1) connected to this SCE platform.

•Priority — For each of the cascaded SCE platforms, this parameter defines whether it is the primary or secondary device.

•On-failure — For each of the cascaded SCE platforms, this parameter determines whether the system cuts the traffic or bypasses it when the SCE platform either has failed or is booting.

Configuring the Connection Mode

Use the following command to configure the connection mode, including the following parameters.

•inline/receive only

•physically connected links

•behavior upon failure of the SCE platform

•primary/secondary

To configure the connection mode, use the following command.

Step 1 From the SCE 2000 (config if)# prompt, type connection-mode inline-cascade|receive-only-cascade [physically-connected-links {link-0|link-1}][priority {primary|secondary}] [on-failure {bypass|cutoff}] and press Enter.

Examples

• EXAMPLE 1

• EXAMPLE 2

EXAMPLE 1

Use the following command to configure the primary SCE platform in a two-SCE platform inline topology. Link 1 is connected to this SCE platform and the behavior of the SCE platform if a failure occurs is bypass .

SCE 2000(config if)#connection-mode inline-cascade physically-connected-links link-1 priority primary on-failure bypass

EXAMPLE 2

Use the following command to configure the SCE platform that might be cascaded with the SCE platform in Example 1. This SCE platform would have to be the secondary SCE platform, and Link 0 would be connected to this SCE platform, since Link 1 was connected to the primary. The connection mode would be the same as the first, and the behavior of the SCE platform if a failure occurs is also bypass.

SCE 2000(config if)# connection-mode inline-cascade physically-connected-links link-0 priority secondary on-failure bypass
How to Monitor the System

Use the following commands to view the current connection mode and link mode parameters.

• How to View the Current Connection Mode

• How to View the Current Link Mode

• How to View Current Link Mappings

How to View the Current Connection Mode

Step 1 From the SCE 2000# prompt, type show interface linecard 0 connection-mode and press Enter.

How to View the Current Link Mode

Step 1 From the SCE 2000# prompt, type show interface linecard 0 link mode and press Enter.

How to View Current Link Mappings

Step 1 From the SCE 2000# prompt, type show interface linecard 0 physically-connected-links and press Enter.

System Upgrades

• Firmware Upgrade (package installation)

• Application Upgrade

• Simultaneous Upgrade of Firmware and Application

In a redundant solution, it is important that firmware and/or application upgrades be performed in such a way that line and service are preserved.

Refer to the following sections for instructions on how to perform these procedures on two cascaded SCE platforms:

•Upgrade the firmware only

•Upgrade the application only

•Upgrade both the firmware and the application at the same time

Note When upgrading only one component (either firmware only or application only), always verify that the upgraded component is compatible with the component that was not upgraded.

Firmware Upgrade (package installation)

SUMMARY STEPS

1. Install package on both SCE platforms (open the package and copy configuration).

2. Reload the standby SCE platform.

3. Wait until the standby finishes synchronizing and is ready to work.

4. Make sure that the connection mode configurations are correct.

5. Reload the active SCE platform.

6. After the former active SCE platform reboots and is ready to work manually, it may be left as standby or we can manually switch the SCE platforms to their original state.

DETAILED STEPS

Step 1 Install package on both SCE platforms (open the package and copy configuration).

Step 2 Reload the standby SCE platform.

Step 3 Wait until the standby finishes synchronizing and is ready to work.

Step 4 Make sure that the connection mode configurations are correct.

Step 5 Reload the active SCE platform.

Step 6 After the former active SCE platform reboots and is ready to work manually, it may be left as standby or we can manually switch the SCE platforms to their original state.

Application Upgrade

SUMMARY STEPS

1. Unload the application in the standby SCE platform.

2. Load new application to the standby SCE platform.

3. Make sure that the connection mode configurations are correct.

4. Wait until the standby SCE platform finishes synchronizing and is ready to work.

5. Force failure condition in the active SCE platform.

6. Upgrade the application in the former active SCE platform.

7. Remove the force failure condition in that platform.

8. After the former active SCE platform recovers and is ready to work, it may remain the standby or can be manually switched back to active.

DETAILED STEPS

Step 1 Unload the application in the standby SCE platform.

Step 2 Load new application to the standby SCE platform.

Step 3 Make sure that the connection mode configurations are correct.

Step 4 Wait until the standby SCE platform finishes synchronizing and is ready to work.

Step 5 Force failure condition in the active SCE platform.

Step 6 Upgrade the application in the former active SCE platform.

Step 7 Remove the force failure condition in that platform.

Step 8 After the former active SCE platform recovers and is ready to work, it may remain the standby or can be manually switched back to active.

Simultaneous Upgrade of Firmware and Application

SUMMARY STEPS

1. In the standby SCE platform:

2. Uninstall the application.

3. Upgrade the firmware (this includes a reboot).

4. Install the new application.

5. Force-failure in the active SCE platform.

6. Repeat step 1 for the (now) standby SCE platform.

DETAILED STEPS

Step 1 In the standby SCE platform:

a. Uninstall the application.

b. Upgrade the firmware (this includes a reboot).

c. Install the new application.

Step 2 Force-failure in the active SCE platform.

This makes the updated SCE platform the active one, and it begins to give the NEW service.

Step 3 Repeat step 1 for the (now) standby SCE platform.

Since this includes a reboot, it is not necessary to undo the force failure command.

Posted: Wed May 30 08:53:51 PDT 2007
All contents are Copyright © 1992--2007 Cisco Systems, Inc. All rights reserved.
Important Notices and Privacy Statement.

bigmir)net

156