When disaster strikes, the future of your company—and your career—hangs in the balance. The deciding factor for both is the ability to quickly switch to a failover system so that business can continue with little to no interruption. So, you have two choices: establish successful role swaps or brush up the old résumé.
It’s possible that you might be dreading either activity equally—and with good reason if you’ve never experienced a successful role swap or even tried one. Most high availability (HA) software or hardware solutions have a facility for doing role swaps. While some are easier to use than others, the key is to become comfortable with performing a role swap.
Are Role Swaps Really That Easy?
If it was that easy, everyone would be doing a role swap once a quarter or at least once a year. Yes, the complexity of your environment can complicate things, but what’s really holding you back might just be the fear of messing up, of not being able to role swap after assuring everyone you could do it.
Still, it’s better to fail during a test than an actual disaster. In the first case, you simply swallow some pride and set up another time to test the role swap again.
You might be familiar with the position that a backup is not a backup if you’ve never tested the restore. The same holds true for high availability. It’s not HA if you’ve never completed a full role swap successfully. Until you do a role swap, your system is only set up for disaster recovery (DR). You could probably get the server running again after hours of high-pressure IT anxiety, but that falls painfully short of the business continuity scenario you invested in when you purchased an HA solution.
Testing your role swaps is imperative. Hire an expert to help if you can’t seem to get it done with your in-house resources, or follow these steps and start testing your role swaps today.
Start with a Test-While-Active Role Swap
- Before you plan your role swap test, define your recovery time objective (RTO) to get a better sense of how much time recovery is allowed take. It could be minutes or hours, which will make a big difference in how you test and how often you perform the role swaps.
- Next, consult your in-house replication expert to ensure there is no latency and no hourly or daily replication issues that have not been resolved permanently.
- After that, verify that object changes, data changes, and IFS changes are making it to the target server in a reasonable amount of time. A database or query tool can help you ensure that data on the source server is in sync with data on the target server.
- Finally, simulate a failure. Simply leave the production server alone and let your end users do their normal work without any interruptions. Your HA software should have a test-while-active role swap feature to help verify that you could bring up applications and do business on the target server. In most cases, you simply need to end the server jobs on the target server that are reading the journals and handling object changes while you test.
If you are comfortable with doing role swaps while active, the next step is a full role swap. Before you can do a full role swap, work with your network team to ensure that the IBM i server and access to this server is virtual. This allows you to switch your users without a major headache within five minutes of testing.
Perform a Full Role Swap
- Verify that your apply jobs and remote journaling is up to date. You may choose to keep receiver jobs running and not have the apply jobs active on the target system. Depending on the volume, it could take hours to process these transactions if you are behind, so it’s important to research caching for journaling on the target system and create the most time-conscious strategy for processing transactions. You don’t want to wait until the day of the role swap to find out that you are always behind on transactions.
- Verify that the data in sync between source and target. Create a script with processes, queries, and reports to run after getting all users off of the production system and before letting users on the target system. These will be the data sources that you compare and verify. If replication is working properly, there should be no discrepancies.
- Next, perform an orderly shutdown of your applications on the source server. Many companies develop a CL program that can be launched manually or automated using a scheduling tool. Some even use monitoring tools that visually display the applications as they are taken offline in the proper order. Depending on the size of the system, this process alone could take five to 15 minutes, so ensure you have a process to get users off and lock them out until you have fully verified that the target server is ready for them to use.
- Once your applications on the source server have been shut down and the libraries on the target server are in full replicate of the source, you can perform the role swap. Most HA solution have a feature that you execute to switch the source and target servers. Do this now.
- The next step is to allow a few end users to verify that the system, their data, and their applications are available. Provided they encounter no problems, let all of your users onto the system. Your server that was the target should now function as the source. Your old system that was the source can now become the target.
Evaluate Role Swap Success
With any HA solution, you need to ensure that every transaction generated was processed and applied on the target system. This can mean using application menus and manually looking at the audit journal and all transactions in the journal(s). For example, you might be running IBM MQSeries and you may need to look at its journal. You have to know your application in order to verify.
You also need to ensure that there are no objects on hold. Each HA solution is different, so verifying that both target and source are in sync is the most important aspect of the role swap. This helps you find errors or performance issues that you may be able to reconfigure by talking with your HA solution provider.
In a test-while-active role swap, if your test includes updating data in an application on the target server, you will want to refresh and synchronize the data on the target after completing the test. Your HA software should have a process that verifies and self-heals the data to reflect an in-sync relationship between source and target.
If you performed a full swap and plan to stay on the target server as the new production server, you will need to make sure the old source now acting as a target is in sync with the new source system.