Today’s customers and end users expect their computer systems, applications, and networks to be available at all times. Nobody wants to hear that the system will be out of service for a little while in the evening or over the weekend due to backups or hardware maintenance. We all want access all the time.
IBM Power Systems hardware running the IBM i operating system has great uptime statistics; it enjoys a reputation for delivering nearly 100 percent uptime. In fact, some IBM i users out there—and you might be one—never IPL their production servers, never bring the system down for backups, and never have problems with application failures, but this kind of reliability has a dangerous outcome: complacency
Sometimes IBM i makes it too easy to forget that you aren’t always in control. Sure, you can choose to skip IPLs and backups, but hardware failures and natural disasters, while rare, present serious risks that you can’t predict or control. The prudent IT infrastructure manager must be prepared for these incidents so that business can continue without interruption no matter what happens.
How Available Do I Need to Be?
The solution is to use a high availability (HA) solution to replicate data between two different systems, preferably in two different geographic regions, to increase uptime in the event of a hardware failure or natural disaster.
While many IT pros and end users may wish for their data and applications to be available 100 percent of the time, even reaching for the rule of five nines, or 99.999 percent availability, is a lofty goal that translates to only five minutes of downtime per year.
In a recent survey conducted by Information Technology Intelligence Corp (ITIC), 72 percent of respondents consider 99.99 percent to be the minimum acceptable level of reliability for their main line of business servers, up from 49 percent in 2014.
Your availability percentage indicates the maximum amount of time your business finds it acceptable for applications to be unavailable in a given time period. This includes downtime caused by planned activities such as nightly backups and maintenance, plus unplanned activities such as application, system, or site failures.
Several factors can impact your availability percentage, including your recovery time objective (RTO), which is the maximum period of time recovery is allowed take, and your recovery point objective (RPO), which is the maximum amount of data your business can stand to lose. Anything that exceeds these thresholds indicates an unacceptable—perhaps unrecoverable—impact to the business.
Let’s say your company can withstand a 30-minute business outage shortly after a declared disaster. Your IT team will rely on your high availability solution to switch from your production (source) server to another (target) server, which becomes the new production server, within the 30-minute RTO window to minimize business impact.
When it comes to RPO, depending on your IBM i application, the software itself can roll back to a clean set of transactions. This feature is called commitment control, but it’s not standard in most applications developed on IBM i.
If commitment control is not part of your application, you must review the business applications and decide where to pick up or what business might need to be redone. Additional considerations include time of day and the different types of business that might be happening during the different business cycles in any given day. For example, month-end might be treated differently than a normal business day. Here again, a high availability solution would ensure that the data was safely moved from source to target server, but you still need to make sure your application is healthy to continue.
How Much Does High Availability Cost?
Defining RTO and RPO for your business helps you determine the cost of an outage. In the grand scheme of things, a few missed transactions won’t put you out of business, but several days of business downtime is a different story. Just look at what happened during the summer of 2016 when Delta and Southwest Airlines had IT outages that cost them both millions of dollars and customer satisfaction.
In the recent ITIC survey, 81 percent of respondents reported that one hour of downtime costs in excess of $300,000. Clearly, a tested and proven high availability solution can be worth its weight in gold, but that doesn’t mean you have to pay it. Your high availability budget should take your estimated downtime costs into consideration.
To estimate the average cost of planned and unplanned downtime to your business per hour, we recommend starting with your Finance department; they usually have the most reliable figures and formulas. Downtime calculators available online can also be used to get a rough figure quickly. They generally take annual business revenue, number of employees affected, and average employee cost per hour into consideration. They should also allow you to specify how sales and productivity will be impacted in the event of downtime as a percentage.
Once your cost of downtime is determined, you should estimate how much downtime will be reduced (in hours) over a set period of time following the implementation of a high availability solution. Multiply these hours to reflect an acceptable ROI time period (3 years is standard) and set your maximum budget. That doesn’t mean you should pay this much for your high availability solution, it just means that it doesn’t make sense financially to pay more than this.
In short, if your business can’t afford an outage, it can afford a high availability solution.