Capacity planning is not an exact science and it can be a nightmare if you don’t know what you’re doing. While there are many ways to go about capacity planning for IBM i, there is a clear difference between a successful project and one that doesn’t stand a chance. Here are the top five mistakes that organizations often encounter when doing capacity planning projects.
#1 – No Historical Data
IBM’s performance management files have not only become an integral part of Collection Services, but these daily performance files are also a window into the performance of your machines. In fact, these performance management files (a.k.a. the QAPM files) are used in your own queries to understand the performance of the box. But the view is limited.
Third-party performance analysis like Performance Navigator allow you to see historical performance trends for all core metrics (CPU, disk, and memory). Figure 1 shows an example of historical CPW trending. This type of data is a key component of capacity planning.
Figure 1: Trending CPW graph in Performance Navigator
Between these two tools, there’s no excuse for doing a capacity plan with little or no performance data. Can you imagine making a $2 million decision based solely on a single day of performance data? Unfortunately, this is a common mistake.
So many things can go wrong in this scenario. The date selected (a.k.a. your baseline data) is almost certainly not representative of a peak period. As a result, you would see performance degradation when a true peak period is encountered, or the upgrade could simply fail on day one.
While you may not have a ton of historical data for every situation, doing a capacity plan based on one day is just setting yourself up for trouble down the road.
Best Practice Recommendation: If you only have a few weeks of data, make sure you select a peak day where your systems are busy. Month-end processing is a great example.
#2 – No Structured Methodology
Everyone has their own philosophy regarding capacity planning, but don’t confuse a philosophy with a structured capacity planning methodology.
Here are few examples of what not to do:
- Focus only on CPW math and forget the impact to the disk and memory components
- Not use real-life historical trends when the data was available
- Focus only on the busiest intervals throughout the day
- Not select a peak day for all LPARs on the frame
- Not involve experienced personnel in the project
Many capacity planners will say that getting your baseline data correct is the important process in a structured capacity planning methodology. It’s hard to argue with that logic. Sizing your LPARs accurately does require you to ensure that baseline data reflects your busy time of the year.
However, the most successful capacity planning projects go beyond the data. They are a total team effort. Systems administrators and business partners don’t have a crystal ball to make predictions, so a sound methodology is the key.
A proven methodology requires:
- The ability to choose the correct baseline data
- Business partner to explain new hardware architectures to the customer
- System administrator to work with internal stakeholders (e.g., Marketing, Accounting, Warehouse, HR) to understand upcoming changes for existing or new applications
- System administrator to work with senior management to understand upcoming strategic company changes (e.g., buying a new company, opening new stores, downsizing the company)
- Strong communication between the customer and the business partner
Best Practice Recommendation: Use your existing monthly performance reporting and application performance studies as supporting documentation for upcoming capacity planning projects. Historical trend data and comprehensive job data can often be a real indication of things to come.
#3 – Not Understanding the Impact of Application Changes
Here’s a real-life example to help illustrate this mistake.
Company XYZ was working with their business partner on a new Power Systems upgrade. They followed a structured methodology, and both agreed on the peak day selected (baseline data), but when Company XYZ went live on the new server, the response time for one of their core applications was worse after the upgrade.
Luckily, Company XYZ had installed Performance Navigator. Using years of historical data, they were able to figure out why the upgrade failed: the performance degradation was due to the baseline data not being reflective of the workload when new server went live.
Using the What’s Different Analysis feature in Performance Navigator, which shows a before-and-after analysis, they were able to verify that there were 76 percent more jobs on the new LPAR following the upgrade, as shown in Figure 2.
Figure 2: What’s Different Analysis feature in Performance Navigator
The analysis also indicated that the baseline data was selected 15 months prior to the go-live date. Any company can undergo major changes in 15 months, and further analysis revealed that Company XYZ had in fact re-architected one of their core applications after the baseline data had been selected.
Before-and-after monthly comparisons showed that the number of runs for a key job had increased a whopping 78 percent, as show in Figure 3.
Figure 3: Before-and-after analysis in Performance Navigator shows the impact of change
Best Practice Recommendation: Years of historical data can be a capacity planner’s best friend. Even if you don’t have an upgrade planned for this year, lay the groundwork by installing a tool to start collecting this valuable historical information.
#4 – Not Understanding the Impact of New Architecture
Let’s say you’re involved in a disk upgrade that provides you with more disk capacity. Good news, right? Now you have more disk space! But wait. Each LPAR now has 56 percent fewer arms, so arm utilization increases and performance suffers, as shown in Figure 4.
Figure 4: Disk arm utilization in Performance Navigator
This common storage architecture mistake is not exclusive to internal disk. Capacity planners often make this mistake with external disk as well, where performance suffers after an upgrade due to not creating enough logical arms (LUNs).
Best Practice Recommendation: When doing a disk capacity plan, don’t simply focus on capacity. Take a step back and focus instead on getting A+ disk performance service levels.
#5 – Not Understanding the Impact of Moving to the Cloud
If you are planning a move to the cloud or a managed service provider (MSP), it is critical to take capacity planning into consideration. Many capacity planning questions will arise, and it’s a common mistake that they are left unanswered.
Say you have a POWER7 server on-premises and you’re looking to move to an MSP cloud provider. Here’s the first answer you need to find: what are the actual specifications of the hardware that we will be renting?
It’s important to remember that MSPs are renting out space (CPW) for their clients. They have a bunch of Power servers with clients configured into individual LPARs. You want to be sure that your rental partition has the same features (or better) than what you had on your on-prem system and that your production workload can fit comfortably on the LPAR (with plenty of room for growth).
Figure 5 shows a common report that MSPs use to manage their enterprise:
Figure 5: Enterprise Hardware Summary in Performance Navigator
As you can see, you are not the only client on the frame. So, don’t be afraid to ask for specificity when inquiring about the hardware where the MSP will have you running.
When you are evaluating MSPs, make sure that you find a satisfactory answer to the questions like these:
- Is my system a POWER7? POWER8? POWER9?
- What is the model and feature code?
- How many cores will I get?
- What is my guaranteed total CPW?
- What is the configuration of each of my LPARs?
- Are my LPARs capped or uncapped?
- What is my guaranteed total memory?
- What happens when we need more capacity?
- What is the actual storage solution?
- How many LUNs (logical arms) will I have on each LPAR?
- Will I have spinning drives, SSDs, or a hybrid configuration?
- What is my guaranteed disk response time service-level agreement (SLA)?
Best Practice Recommendation: Create a capacity planning task force at your organization and generate a structured checklist of questions to ask the MSP upfront. That way, there will be no surprises.
Capacity Planning Best Practices Recap
There are many challenges when doing capacity planning on IBM i, but it doesn’t have to be difficult. HelpSystems is on your side. With our capacity planning professional services and software like Performance Navigator, we aim to empower you with the expertise and tools to accurately determine what hardware is needed, so you aren’t at the mercy of vendor recommendations.
Here’s a quick recap of the best practice recommendations we shared today:
- If you have historical data for your core metrics (CPU, disk, and memory), use it to understand current trends and to select your baseline data. Remember that baseline data is your starting point.
- Create and follow a structured methodology for every capacity plan.
- Understand what’s happening within your company for all functional areas. Changes in core applications and company strategies are key inputs into capacity planning projects.
- Do your due diligence to understand new hardware solutions. The golden rule is simple: don’t do capacity planning based on capacity only. Instead, approach all projects with performance service levels in mind.
- When considering a move to the cloud, interview your MSP on all hardware configurations. Be very specific and ask a lot of questions about the environment where they plan to move you. Most importantly, agree on performance service levels before you seal the deal.
Following the best practices outlined in this article will help you avoid the common capacity planning mistakes. Along the way, you’ll end up with confidence in your capacity plan, impeccable performance service levels, and a very happy CIO. That’s a win for everyone!
The historical performance data on your servers is a treasure trove of information regarding actual usage over time. But you must access and interpret this data to get to the bottom of performance issues or inform future hardware investments. Performance Navigator can help! Request a live demonstration and we’ll show you how.