Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Understanding and Designing Serviceguard Disaster Tolerant Architectures:

Glossary

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

A

application restart 

Starting an application, usually on another node, after a failure. Application can be restarted manually, which may be necessary if data must be restarted before the application can run (example: Business Recovery Services work like this.) Applications can by restarted by an operator using a script, which can reduce human error. Or applications can be started on the local or remote site automatically after detecting the failure of the primary site.


arbitrator 

Nodes in a disaster tolerant architecture that act as tie-breakers in case all of the nodes in a data center go down at the same time. These nodes are full members of the Serviceguard cluster and must conform to the minimum requirements. The arbitrator must be located in a third data center to ensure that the failure of an entire data center does not bring the entire cluster down. See also quorum server.


asymmetrical cluster 

A cluster that has more nodes at one site than at another. For example, an asymmetrical metropolitan cluster may have two nodes in one building, and three nodes in another building. Asymmetrical clusters are not supported in all disaster tolerant architectures.


asynchronous data replication 

Local I/O will complete without waiting for the replicated I/O to complete; however, it is expected that asynchronous data replication will process the I/Os in the original order.


automatic failover 

Failover directed by automation scripts or software (such as Serviceguard) and requiring no human intervention. In a ContinentalClusters environment, the start-up of package recovery groups on the Recovery Cluster without intervention. See also application restart.


B

BC  

(Business Copy) A PVOL or SVOL in an HP StorageWorks XP series disk array that can be split from or merged into a normal PVOL or SVOL. It is often used to create a snapshot of the data taken at a known point in time. Although this copy, when split, is often consistent, it is not usually current.


BCV 

(Business Continuity Volume) An EMC Symmetrix term that refers to a logical device on the EMC Symmetrix that may be merged into or split from a regular R1 or R2 logical device. It is often used to create a snapshot of the data taken at a known point in time. Although this copy, when split, is often consistent, it is not usually current.


bi-directional configuration 

A continental cluster configuration in which each cluster serves the roles of primary and recovery cluster for different recovery groups. Also known as a mutual recovery configuration.


Business Recovery Service 

Service provided by a vendor to host the backup systems needed to run mission critical applications following a disaster.


C

campus cluster 

A single cluster that is geographically dispersed within the confines of an area owned or leased by the organization such that it has the right to run cables above or below ground between buildings in the campus. Campus clusters are usually spread out in different rooms in a single building, or in different adjacent or nearby buildings. See also Extended Distance Cluster.


cascading failover  

Cascading failover is the ability of an application to fail from a primary to a secondary location, and then to fail to a recovery location on a different site. The primary location contains a metropolitan cluster built with Metrocluster EMC SRDF, and the recovery location has a standard Serviceguard cluster.


client reconnect 

Users access to the backup site after failover. Client reconnect can be transparent, where the user is automatically connected to the application running on the remote site, or manual, where the user selects a site to connect to.


cluster 

An Serviceguard cluster is a networked grouping of HP 9000 and/or HP Integrity Servers series 800 servers (host systems known as nodes) having sufficient redundancy of software and hardware that a single failure will not significantly disrupt service. Serviceguard software monitors the health of nodes, networks, application services, EMS resources, and makes failover decisions based on where the application is able to run successfully.


cluster alarm 

Time at which a message is sent indicating that the Primary Cluster is probably in need of recovery. The cmrecovercl command is enabled at this time.


cluster alert 

Time at which a message is sent indicating a problem with the cluster.


cluster event 

A cluster condition that occurs when the cluster goes down or enters an UNKNOWN state, or when the monitor software returns an error. This event may cause an alert messages to be sent out, or it may cause an alarm condition to be set, which allows the administrator on the Recovery Cluster to issue the cmrecovercl command. The return of the cluster to the UP state results in a cancellation of the event, which may be accompanied by a cancel event notice. In addition, the cancellation disables the use of the cmrecovercl command.


cluster quorum 

A dynamically calculated majority used to determine whether any grouping of nodes is sufficient to start or run the cluster. Cluster quorums prevent split-brain syndrome which can lead to data corruption or inconsistency. Currently at least 50% of the nodes plus a tie-breaker are required for a quorum. If no tie-breaker is configured, then greater than 50% of the nodes is required to start and run a cluster.


command device 

A disk area in the HP StorageWorks XP series disk array used for internal system communication. You create two command devices on each array, each with alternate links (PV links).


consistency group 

A set of Symmetrix RDF devices that are configured to act in unison to maintain the integrity of a database. Consistency groups allow you to configure R1/R2 devices on multiple Symmetrix frames in Metrocluster with EMC SRDF.


continental cluster 

A group of clusters that use routed networks and/or common carrier networks for data replication and cluster communication to support package failover between separate clusters in different data centers. Continental clusters are often located in different cities or different countries and can span 100s or 1000s of kilometers.


Continuous Access  

A facility provided by the Continuos Access software option available with the HP StorageWorks E Disk Array XP series. This facility enables physical data replication between XP series disk arrays.


D

data center 

A physically proximate collection of nodes and disks, usually all in one room.


data consistency 

Whether data are logically correct and immediately usable; the validity of the data after the last write. Inconsistent data, if not recoverable to a consistent state, is corrupt.


data currency 

Whether the data contain the most recent transactions, and/or whether the replica database has all of the committed transactions that the primary database contains; speed of data replication may cause the replica to lag behind the primary copy, and compromise data currency.


data loss 

The inability to take action to recover data. Data loss can be the result of transactions being copied that were lost when a failure occurred, non-committed transactions that were rolled back as pat of a recovery process, data in the process of being replicated that never made it to the replica because of a failure, transactions that were committed after the last tape backup when a failure occurred that required a reload from the last tape backup. transaction processing monitors (TPM), message queuing software, and synchronous data replication are measures that can protect against data loss.


data mirroring 

See See mirroring..


data recoverability 

The ability to take action that results in data consistency, for example database rollback/roll forward recovery.


data replication 

The scheme by which data is copied from one site to another for disaster tolerance. Data replication can be either physical (see physical data replication) or logical (see logical data replication). In a ContinentalClusters environment, the process by which data that is used by the Primary Cluster packages is transferred to the Recovery Cluster and made available for use on the Recovery Cluster in the event of a recovery.


database replication 

A software-based logical data replication scheme that is offered by most database vendors.


disaster 

An event causing the failure of multiple components or entire data centers that render unavailable all services at a single location; these include natural disasters such as earthquake, fire, or flood, acts of terrorism or sabotage, large-scale power outages.


disaster protection 

(Don’t use this term?) Processes, tools, hardware, and software that provide protection in the event of an extreme occurrence that causes application downtime such that the application can be restarted at a different location within a fixed period of time.


disaster recovery 

The process of restoring access to applications and data after a disaster. Disaster recovery can be manual, meaning human intervention is required, or it can be automated, requiring little or no human intervention.


disaster recovery services 

Services and products offered by companies that provide the hardware, software, processes, and people necessary to recover from a disaster.


disaster tolerant 

The characteristic of being able to recover quickly from a disaster. Components of disaster tolerance include redundant hardware, data replication, geographic dispersion, partial or complete recovery automation, and well-defined recovery procedures.


disaster tolerant architecture 

A cluster architecture that protects against multiple points of failure or a single catastrophic failure that affects many components by locating parts of the cluster at a remote site and by providing data replication to the remote site. Other components of disaster tolerant architecture include redundant links, either for networking or data replication, that are installed along different routes, and automation of most or all of the recovery process.


E, F

ESCON  

Enterprise Storage Connect. A type of fiber-optic channel used for inter-frame communication between EMC Symmetrix frames using EMC SRDF or between HP StorageWorks E XP series disk array units using Continuous Access XP.


event log  

The default location (/var/adm/cmconcl/eventlog) where events are logged on the monitoring ContinentalClusters system. All events are written to this log, as well as all notifications that are sent elsewhere.


Extended Distance Cluster  

A cluster with alternate nodes located in different data centers separated by some distance. Formerly known as campus cluster.


failback 

Failing back from a backup node, which may or may not be remote, to the primary node that the application normally runs on.


failover 

The transfer of control of an application or service from one node to another node after a failure. Failover can be manual, requiring human intervention, or automated, requiring little or no human intervention.


filesystem replication 

The process of replicating filesystem changes from one node to another.


G

gatekeeper 

A small EMC Symmetrix device configured to function as a lock during certain state change operations.


H, I

heartbeat network 

A network that provides reliable communication among nodes in a cluster, including the transmission of heartbeat messages, signals from each functioning node, which are central to the operation of the cluster, and which determine the health of the nodes in the cluster.


high availability  

A combination of technology, processes, and support partnerships that provide greater application or system availability.


J, K, L

local cluster 

A cluster located in a single data center. This type of cluster is not disaster tolerant.


local failover 

Failover on the same node; this most often applied to hardware failover, for example local LAN failover is switching to the secondary LAN card on the same node after the primary LAN card has failed.


logical data replication 

A type of on-line data replication that replicates logical transactions that change either the filesystem or the database. Complex transactions may result in the modification of many diverse physical blocks on the disk.


LUN 

(Logical Unit Number) A SCSI term that refers to a logical disk device composed of one or more physical disk mechanisms, typically configured into a RAID level.


M

M by N  

A type of Symmetrix grouping in which up to two Symmetrix frames may be configured on either side of a data replication link in a Metrocluster with EMC SRDF configuration. M by N configurations include 1 by 2, 2 by 1, and 2 by 2.


manual failover 

Failover requiring human intervention to start an application or service on another node.


Metrocluster 

A Hewlett-Packard product that allows a customer to configure an Serviceguard cluster as a disaster tolerant metropolitan cluster.


metropolitan cluster 

A cluster that is geographically dispersed within the confines of a metropolitan area requiring right-of-way to lay cable for redundant network and data replication components.


mirrored data 

Data that is copied using mirroring.


mirroring 

Disk mirroring hardware or software, such as MirrorDisk/UX. Some mirroring methods may allow splitting and merging.


mission critical application 

Hardware, software, processes and support services that must meet the uptime requirements of an organization. Examples of mission critical application that must be able to survive regional disasters include financial trading services, e-business operations, 911 phone service, and patient record databases.


mission critical solution 

The architecture and processes that provide the required uptime for mission critical applications.


multiple points of failure (MPOF) 

More than one point of failure that can bring down an Serviceguard cluster.


multiple system high availability 

Cluster technology and architecture that increases the level of availability by grouping systems into a cooperative failover design.


mutual recovery configuration 

A continental cluster configuration in which each cluster serves the roles of primary and recovery cluster for different recovery groups. Also known as a bi-directional configuration.


N

network failover 

The ability to restore a network connection after a failure in network hardware when there are redundant network links to the same IP subnet.


notification 

A message that is sent following a cluster or package event.


O

off-line data replication. 

Data replication by storing data off-line, usually a backup tape or disk stored in a safe location; this method is best for applications that can accept a 24-hour recovery time.


on-line data replication 

Data replication by copying to another location that is immediately accessible. On-line data replication is usually done by transmitting data over a link in real time or with a slight delay to a remote site; this method is best for applications requiring quick recovery (within a few hours or minutes).


P

package alert 

Time at which a message is sent indicating a problem with a package.


package event 

A package condition such as a failure that causes a notification message to be sent. Package events can be accompanied by alerts, but not alarms. Messages are for information only; the cmrecovercl command is not enabled for a package event.


package recovery group 

A set of one or more packages with a mapping between their instances on the Primary Cluster and their instances on the Recovery Cluster.


physical data replication 

An on-line data replication method that duplicates I/O writes to another disk on a physical block basis. Physical replication can be hardware-based where data is replicated between disks over a dedicated link (for example EMC’s Symmetrix Remote Data Facility or the HP StorageWorks E Disk Array XP Series Continuous Access), or software-based where data is replicated on multiple disks using dedicated software on the primary node (for example, MirrorDisk/UX).


planned downtime 

An anticipated period of time when nodes are taken down for hardware maintenance, software maintenance (OS and application), backup, reorganization, upgrades (software or hardware), etc.


PowerPath 

A host-based software product from Symmetrix that delivers intelligent I/O path management. PowerPath is required for M by N Symmetrix configurations using Metrocluster with EMC SRDF.


Primary Cluster 

A cluster in production that has packages protected by the HP ContinentalClusters product.


primary package 

The package that normally runs on the Primary Cluster in a production environment.


pushbutton failover 

Use of the cmrecovercl command to allow all package recovery groups to start up on the Recovery Cluster following a significant cluster event on the Primary Cluster.


PV links 

A method of LVM configuration that allows you to provide redundant disk interfaces and buses to disk arrays, thereby protecting against single points of failure in disk cards and cables.


PVOL 

A primary volume configured in an XP series disk array that uses Continuous Access. PVOLs are the primary copies in physical data replication with Continuos Access on the XP.


Q

quorum 

See See cluster quorum..


quorum server 

A cluster node that acts as a tie-breaker in a disaster tolerant architecture in case all of the nodes in a data center go down at the same time. See also arbitrator.


R

R1 

The Symmetrix term indicating the data copy that is the primary copy.


R2 

The Symmetrix term indicating the remote data copy that is the secondary copy. It is normally read-only by the nodes at the remote site.


Recovery Cluster 

A cluster on which recovery of a package takes place following a failure on the Primary Cluster.


recovery group failover 

A failover of a package recovery group from one cluster to another.


recovery package 

The package that takes over on the Recovery Cluster in the event of a failure on the Primary Cluster.


regional disaster 

A disaster, such as an earthquake or hurricane, that affects a large region. Local, campus, and proximate metropolitan clusters are less likely to protect from regional disasters.


remote failover 

Failover to a node at another data center or remote location.


resynchronization 

The process of making the data between two sites consistent and current once systems are restored following a failure. Also called data resynchronization.


rolling disaster 

A second disaster that occurs before recovering from a previous disaster, for example, while data is being synchronized between two data centers after a disaster, one of the data centers fails, interrupting the data synchronization process. Rolling disasters may result in data corruption that requires a reload from tape backups.


S

single point of failure (SPOF) 

A component of a cluster or node that, if it fails, affects access to applications or services. See also multiple points of failure.


single system high availability 

Hardware design that results in a single system that has availability higher than normal. Hardware design examples are:

  • n+1 fans

  • n+1 power supplies

  • multiple power cords

  • on-line addition or replacement of I/O cards, memory, etc.


special device file 

The device file name that the HP-UX operating system gives to a single connection to a node, in the format /dev/devtype/filename.


split-brain syndrome 

When a cluster reforms with equal numbers of nodes at each site, and each half of the cluster thinks it is the authority and starts up the same set of applications, and tries to modify the same data, resulting in data corruption. Serviceguard architecture prevents split-brain syndrome in all cases unless dual cluster locks are used.


SRDF 

(Symmetrix Remote Data Facility) A level 1-3 protocol used for physical data replication between EMC Symmetrix disk arrays.


SVOL 

A secondary volume configured in an XP series disk array that uses Continuous Access. SVOLs are the secondary copies in physical data replication with Continuos Access on the XP.


SymCLI 

The Symmetrix command line interface used to configure and manage EMC Symmetrix disk arrays.


Symmetrix device number 

The unique device number that identifies an EMC logical volume.


synchronous data replication 

Each data replication I/O waits for the preceding I/O to complete before beginning another replication. Minimizes the chance of inconsistent or corrupt data in the event of a rolling disaster.


T

transaction processing monitor (TPM) 

Software that allows you to modify an application to store in-flight transactions in an external location until that transaction has been committed to all possible copies of the database or filesystem, thus ensuring completion of all copied transactions. A TPM protects against data loss at the expense of the CPU overhead involved in applying the transaction in each database replica.

Software that provides a reliable mechanism to ensure that all transactions are successfully committed. A TPM may also provide load balancing among nodes.


transparent failover 

A client application that automatically reconnects to a new server without the user taking any action.


transparent IP failover 

Moving the IP address from one network interface card (NIC), in the same node or another node, to another NIC that is attached to the same IP subnet so that users or applications may always specify the same IP name/address whenever they connect, even after a failure.


U-Z

volume group 

In LVM, a set of physical volumes such that logical volumes can be defined within the volume group for user access. A volume group can be activated by only one node at a time unless you are using Serviceguard OPS Edition. Serviceguard can activate a volume group when it starts a package. A given disk can belong to only one volume group. A logical volume can belong to only one volume group.


WAN data replication solutions 

Data replication that functions over leased or switched lines. See also continental cluster.


Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.