High Availability Monitoring and Halcyon

May 1, 2017


In IT, for a system, service, or application to be considered highly available, it needs to be accessible for 99.999 percent of the time (a.k.a. the rule of five nines), which equates to around five minutes of downtime a year.

For IBM i users, the options for achieving this level of high availability are based on three main areas: 1) logical replication, 2) storage-based replication, and 3) PowerHA geographic mirroring. Which option is best for you, your IBM i estate, and your business depends on a number of factors, including:

  • Where your data resides (*SYSBAS or IASP)
  • Whether you need both nodes to be active
  • The physical distance between the nodes
  • Whether you have internal or external disk
  • And, of course, your budget

The results of 2017 IBM i Marketplace Survey showed that more organizations have concerns over high availability—up 20 percent from our 2016 results. Interestingly, half of IBM i shops have yet to implement an HA solution.

Logical Replication for High Availability

Logical replication is where high availability for IBM i got its start. It relies on a foundation of journaling, which is included with the base IBM i operating system, followed by the installation and configuration of third-party replication software that keeps both source and target partitions in sync.

There are now various configuration options within logical replication that involve the deployment of multiple source and target nodes, providing added resiliency where specifically required.

Logical replication has been popular with the IBM i customer base in part because both partitions remain active, allowing the option of using the target partition for read-only reporting purposes.

Storage-Based Replication

With the advent of external storage solutions, IBM i disk space administration is slowly but surely falling into general disk space administration alongside other more mainstream platforms such as Windows and Linux. With this shift comes the ability to utilize storage-based replication, which is increasingly popular because it offers support for both *SYSBAS and IASP, and unlike logical replication, there is no overhead at all on the IBM i operating system.

A downside of storage-based replication is that by replicating the whole system, the target partition cannot be active at the same time as the source.

Geographic Replication with PowerHA

The survey results also indicated that PowerHA SystemMirror accounts for around 7 percent of high availability solutions in 2017.

PowerHA is built on the concept of self-contained switchable IASPs plus an Administration Domain in *SYSBAS, which houses other objects not designed for IASPs. It is capable of synchronizing some but not all objects located in the Administration Domain. As a result, some vendors now offer complementary solutions to replicate the missing objects. Only time will tell whether it will eventually overtake third-party logical replication solutions on IBM i.

Achieving HA/DR Readiness

Some third-party logical replication vendors indicate that their solutions require some degree of regular checks. Rather than perform these checks manually, best practice is to automate the monitoring of logical replication high availability solutions to ensure that the solution is ready to deliver when disaster strikes.

A high availability monitoring solution such as Halcyon allows multiple monitoring environments to be installed side-by-side and easily switched by ending one subsystem and starting another, addressing the popular practice of keeping both partitions active.

Halcyon can ensure that connectivity between the nodes is possible at all times, that required resources are active, that any replication backlogs are not at excessive levels, and that exceptions are flagged to support personnel or fixed automatically.

Depending on the volume of data being replicated, an operating system overhead could be likely.  Halcyon proactively checks for high CPU, both partition-wide and at job level, ASP usage (unplanned spikes and organic growth), other hardware problems, plus anything else that might impact availability.

When replication issues are detected and reported, having a notification to inform you that all previously reported issues have been resolved can be configured to provide that extra piece of mind.

Some logical replication issues are short-lived, a minor network outage resulting in a temporary bottleneck of transactions waiting to be sent from source to target, for example. To reduce time wasted responding to these false alarms, Halcyon provides a built-in grace period, acting as an intelligent sense-check between potential and real issues.

High Availability Monitoring Templates

High availability monitoring templates for Robot HA, MIMIX, Maxava, iTERA, Quick-EDD/HA, iCluster, PowerHA, and others are shipped with many Halcyon solutions, allowing users to quickly and easily set up proactive monitoring for these critical applications and establish a consistent and improved level of HA/DR readiness.

Plus, whichever high availability solution you have in place, Halcyon can be pre-loaded with license keys for inactive and standby nodes, simplifying the switchover process and negating the need to engage with a third-party vendor when a switch takes place.

Get Started

Setting up an automated monitoring strategy for your high availability solution will free up valuable time and ensure you are switch-ready whenever disaster strikes. To learn more, watch our “Fail-Safe High Availability Monitoring on IBM i” webinar on demand.

Related Solutions