Following a system failure or disaster, do you know with absolute certainty that you can restore your IBM i systems, applications, and databases to where they were when the outage occurred? What’s your plan for getting things back online? How fast can you do it? How do you know? Have you tested your plan?
Disaster recovery (DR) depends not only on having current backups of your data, but also on having a pre-planned set of procedures for reestablishing your IT functions, possibly at another facility if and when disaster prohibits access to your primary facility.
The magnitude of your DR plan depends on the size of your organization and how heavily it relies on IT in order to do business. For most organizations, IT is involved in every customer touch-point, every widget produced, and every service provided, and your customers and staff expect systems to be available around the clock.
That’s where high availability comes in. High availability (HA) is the component of your DR plan that allows you to build a fault-tolerant failover or duplicate IT infrastructure so that you can ensure little to no downtime when disaster strikes. Depending on your business needs, your IBM i may not be as recoverable as you think.
Basic Recoverability: Backups
Recoverability starts with backups. Even with all of the great hardware technology for high availability, replication, and fault-tolerant systems, it is prudent—and sometimes required by law—to have a solid physical backup that can be restored at a hot site, another data center, or locally.
Whether you use tape or virtual tape libraries (VTLs) to back up your data, you must run your backups in such a way that you can be certain you can recover from them. Believe it or not, many teams try to cut corners here, reducing backups to the point of jeopardizing the whole purpose of backing up in the first place—it’s the restore that matters.
Restoration can be a positive experience if you have the proper tools, the right mind set, and have audited or tested your backups. Keep your backups as simple as possible so that you can restore, and rely on backup software that tells you how to restore, just in case.
Test your backups once a year (minimum) and review or audit what is being backed up every quarter. Appoint someone to verify that your backups are running without failure on a daily basis. It doesn’t matter if you still use tape or if you now use VTLs: test, test, and test again. It’s the best way to know you have a good backup.
The downside of running daily, weekly, or even monthly backups can be that you introduce some downtime into your production environment. You might be under pressure from business users, executives, or your management team to reduce or eliminate that downtime. It’s up to you to comply without putting critical data at risk.
Depending on the server and operating system, some applications can be backed up while active. Success here lies not necessarily in the operating system, but rather in how the developers wrote the application. More importantly, whether they thought about how the application would be backed up as they designed.
If save-while-active doesn’t work, it’s time to incorporate a high availability solution into your DR plan.
Advanced Recoverability: High Availability
Recoverability continues with high availability, which requires both a production server (i.e., source) as well as an HA server (i.e., target) that contains your data in duplicate through software- or hardware-based replication. IBM offers a POWER CBU server at a reduced cost to help Power System users save money on their target server.
While HA allows you to establish a one-to-one replication of your applications and databases, it does not translate into a complete DR plan by itself. Some organizations replicate their data locally, in the same data center, and sometimes even on the same server as they replicate from one logical partition to another. The latter is primarily driven by budget restrictions, which is understandable but not recommended.
The ideal HA scenario would be a source system with a target system located far enough away to avoid any natural disaster that is common in the area. For example, you wouldn’t want to have a source in Tampa Bay and a target in Jacksonville—a hurricane could sweep across all of Florida.
The distance in miles doesn’t really matter so much today as both hardware- and software-based replication can accommodate global sources and targets. The defining factor has more to do with the cost of communicating across global data centers, which encompasses routers, switches, end user access points, and security. As far as the end user is concerned, it’s not HA if they can’t access the data that has been replicated.
Total Recoverability: HA + DR
When it comes to recoverability, think about your business in terms of downtime. How much does it cost your business to be down for 15 minutes? 30 minutes? 45 minutes? An hour? Two hours? The cost of downtime for each increment will help you decide which technology investments need to be part of your DR plan.
High availability is a very important aspect of your DR plan; it is essential for almost all critical applications today and most management teams expect HA. You still need to do backups even if you have an HA solution, but with HA you can do the backups on the target server without impacting your production environment.
HA gives you the option to replicate and execute backups locally, and then replicate another copy of the data to a distant data center as a second copy, where the true role swap would occur during a disaster. Zero downtime plus backups that can be relied on to recover the server is an excellent situation for IT staff.
Your Disaster Recovery Plan
A DR plan is like an IT insurance policy. Some executive teams have a higher understanding and place more importance on having assured recoverability, while others are willing to take the risk. Unfortunately, the risk takers don’t invest in the proper technology until it’s too late—and then all fingers are pointed at IT.
Disaster recovery planning is a bigger topic than the scope of this article, and one well-covered by Richard Dolewski in his book, System i Disaster Recovery Planning, which is available online.
For now, as you go to build and continuously update your DR plan, remember that HA protects business continuity and the demands of doing business around the clock while backups ensure that, if all else fails, you still have the business data you need to bring back your organization after a failure without going out of business.
Your DR plan should encompass the entire process of switching data centers where you do business and should account for all disaster scenarios, be they man-made or natural, full or partial, planned or unplanned.
Depending on the disaster, you could rely on your backups, your HA solution, or both, but think about the organizations that were crippled by 9/11 or Hurricane Katrina and whether they were able to recover at all despite losing staff and facilities. Ask yourself: can my organization sustain a disaster like that, or would we have to rebuild from scratch…and can we afford to?