Ask An Engineer - PowerHA FAQ with Brian Nordland - HelpSystems

Ask An Engineer: PowerHA FAQ

Ask an Engineer - What is Power HA

As principal software engineer for PowerHA at HelpSystems, it’s my pleasure to help educate PowerHA users and empower them to take full advantage of this cool technology. I realize not everyone is able to attend my educational sessions at COMMON, and my day job developing this product keeps me pretty busy, so I wanted to share a few great questions that I’ve come across recently. Maybe they are your questions, too. I hope you find the answers helpful!

1. How do PowerHA clusters work? If I have 10 LPARs—five at one site and five at another—each a mix of production, development, and test partitions, how should I design the cluster?

There are really two schools of thought on this one. The first is, if you put everything in a single cluster, that means that you have a single point of control from any node in the cluster. You can easily control every environment from any one node. This can make it easier to manage the entire environment of 10 LPARs, but it comes with drawbacks.

For example, what if someone is on the dev environment and thinks they are switching the dev environment but accidentally switch the production environment instead? Likewise, are credentials the same for all users between all the LPARs? Or do production LPARs have different credentials from development and test environments?

Because of this, you’ll often see that people tend to keep a separate cluster for each environment. This separate cluster for each environment is typically the way that IBM Lab Services recommends. One cluster for production, one cluster for development, and one for test. That said, there are a number of customers that have things combined.

If you have a single cluster, it’s easy to have a single partition for FlashCopy. However, if you have separate clusters, then the typical route is to have a FlashCopy LPAR per environment. This is partially for the same data separation reasons I mention above. If you really want a single FlashCopy partition and want to have separate clusters to do FlashCopy, there is something called the independent ASP (IASP) assigner that can be used for doing so, but it isn’t quite as integrated as the method of having a dedicated FlashCopy partition per environment.

PowerHA provides many options for how to set up the environment. I recommend working with your business partner to help decide the architecture that is best for your unique environment.

2. What are RTO/RPO for each PowerHA option in hours/minutes? I read that IASP vary on can take a long time during failover.

We don’t publish specific recovery time objective (RTO) and recovery point objective (RPO) for PowerHA options for a number of reasons. But here’s what I can tell you.

What your RTO and RPO are going to be all depends on the specific configuration and implementation. This is true for any high availability solution. For example, a synchronous solution (such as Metro Mirror) will provide a near-zero recovery point during normal operation due to the synchronous nature of things. Even if you have a system crash, anything that is journaled locally will recover as part of the vary on of the IASP.

As you move systems further apart, if you stuck with a synchronous solution, the performance of your system starts to decrease due to the amount of time it takes for the data to get to both copies, and so you move to a technology such as Global Mirror. With Global Mirror, your recovery point will be dependent upon how much data you are changing and what sort of bandwidth and latency you have between the sites. The PowerHA Redbooks have high-level information on how to determine disk write rates to help determine bandwidth needs. In addition, the IBM Lab Services group does have a bandwidth analysis offering to help give an idea of how much bandwidth is needed for the selected solution to achieve an acceptable recovery point objective (RPO).

When talking about recovery time objective, a synchronous solution will typically provide a lower RTO simply because there is no waiting for data to synchronize as part of the switch. The switch time for a planned switch typically consists of the following:

  1. End applications on the production system
  2. Perform the switch of the replication product
  3. Start applications on the new production system

What we have typically seen in many environments is that steps #1 and #3 are the longer steps. I have seen anything from 30 seconds to start up applications on the new production system to nearly two hours to start up all applications. That is something you have to do regardless of the underlying replication product.

You had mentioned that you have heard vary on takes a long time. The vary on of an IASP is almost like a small part of the IPL process for the system, only you aren’t bringing up the operating system, just user data. So, you can think of the IASP vary on like a small piece of a system IPL. This is an area that has seen lots of improvements over the last few years and has been reduced significantly. In fact, IASP vary on was improved even more in IBM i 7.4. In some instances, this vary on can be just a few seconds. In other instances, it can be longer. It really is dependent upon the system, and the actual data within the IASP. Here is a demo video.

3. Can we do PowerHA without IASP? What solution looks like with PowerHA without IASP?

PowerHA is designed around the concept of an IASP and an IASP is the foundation of PowerHA technologies. An IASP is really just a way to separate your application data from the operating system. In a way, it is almost like a USB drive plugged into a computer. You can unplug it from one computer and plug it into another computer and access all your data. The really nice thing about an IASP is that it is a lot more integrated into the system than a USB drive in a PC. When implemented, most users have no idea that their data is in an IASP because to them, they just access their data in the libraries as they did before.

4. I’m getting little confused with Global/Metro Mirror options with PowerHA and storage replication Global/Metro options without PowerHA. Can you help clarify?

The option without PowerHA is something we might call full system replication. What is happening is you are replicating the entire system. The picture might look something like this:

PowerHA Full System Replication Illustration

You have a secondary system ready to boot up if you need to switch. The disadvantage of this approach is that you are replicating the operating system and temporary data. With IBM i single-level storage, this means that you could be replicating anything in memory—including temporary storage—which can add up to a significant amount of bandwidth over what PowerHA with an IASP would need. With an IASP, only objects in the IASP are replicated.

Also, since you are replicating the operating system, you don’t have protection against software outages, such as applying fixes or operating system upgrades. Therefore, you must take an outage while upgrading/updating the operating system. With an IASP based approach, you can apply fixes or upgrades to the target system while everyone is accessing the production system, then switch to the target system and upgrade the original production system. This effectively reduces your OS upgrade/update time to just the amount of time it takes to do the switch.

One advantage that replicating an IASP with PowerHA and external storage has over logical replication solutions or full system replication solutions is that, when you are upgrading the target system (say doing an OS upgrade to the latest version), all of your data in the IASP is still being replicated. This means that your recovery point is protected even though the target system is down.

It really boils down to what types of outages you want to protect yourself against.

5. Why do we need PowerHA when we can have storage replication Metro/Global Mirror options with V7000?

PowerHA is the piece that integrates the operating system and the external storage replication together. This allows you to manage everything from IBM i, and helps provide data protection. This way, when doing a switch, PowerHA takes care of the steps in both places to make it a single button switch. In addition, PowerHA provides the interfaces to allow for switching an IASP between systems.

6. In the case of PowerHA when source (SRC) and target (TGT) systems are up, only IASP is varied off at TGT. Why we don’t call it an active-active solution as both LPARs are up?

Even though both the source and target systems are up, the actual user data is in the IASP. Since user data is not accessible on both systems at once, we don’t typically refer to it as an active-active solution, but rather an active-passive solution. Meaning that the target system is ready to take over in the event the primary system goes down, but you can only access the data in the IASP on the primary system.


Hardware-Based Replication with PowerHA for IBM i

The number of shops using PowerHA for IBM i high availability has nearly doubled in the last few years. What makes PowerHA so popular? Find out here.