5 Reasons Monitoring Thresholds Doesn’t Work
There are two common ways that companies approach capacity planning.
One way is through linear trending. This means using a trend line to determine how much capacity you’ll need over time.
The other is through threshold monitoring. This means creating thresholds for utilization of things like CPU and disk space. When that threshold is hit, you’ll know you need to add capacity.
Frequently, these two ways are combined to determine the problems that will be hit in the future.
But there’s a problem with threshold monitoring…
Thresholds don’t work.
1. Thresholds Are Calculated Manually
Threshold monitoring can be appealing. It seems like all you need to do is set thresholds and forget them (until you get an alert).
So you decide your organization’s thresholds for CPU and for disks. These thresholds might be 80 percent of utilization or 50 percent of utilization. It’s up to you to decide what your business needs. If CPU or disks exceed that threshold, then you know you have a problem.
But there’s a big problem with setting thresholds. Even when you’re using historical data to inform your calculations, your thresholds are still arbitrary. They can’t take everything into account.
2. Thresholds Vary
One of the issues with setting thresholds is that they can vary depending on the time of day, week, or month.
Some days of the week might be busier for your business than others. If you receive double the transactions on Wednesdays than you do on Fridays, your thresholds need to reflect that. Or if your business experiences seasonality, you’ll need to set different thresholds based on time of year.
3. Different Things Require Different Thresholds
Each application you have will have a different way of using CPU and disk space. And thresholds don’t factor in what each application is doing from one minute to the next.
For instance, batch applications will always use 100 percent of the CPU. This can lead to a lot of false positives. You’ll think you’ve reached your utilization threshold and need to increase capacity. But the reality is, you don’t need to.
4. Thresholds Don’t Factor in Change
You’ve set your thresholds. But those thresholds are set based on your environment at that time.
If you make changes in hardware or software, you’ll have a problem. Your thresholds will no longer reflect your environment. And that means you’ll need to adjust your thresholds.
Monitoring thresholds can quickly put you into an endless loop. Set thresholds. Change your environment. Repeat.
5. Thresholds Are Based on Utilization
Thresholds are based on utilization. But they can’t measure the impact of utilization.
And when it comes to IT health and risk, the impact is important. You need to know exactly how utilization—over, under, or exactly right—will impact your uptime and bottom line.
Using thresholds for resource utilization can lead to overspending. In cases like this, thresholds take into account utilization benchmarks to trigger more hardware purchases. They don’t take into account the requirements of the applications—which can put your response times at risk.
If you choose to monitor thresholds, your IT environment can get mixed up in risky business. And your thresholds can’t help you strike the balance between IT health and risk.
Instead of Monitoring Thresholds…
Threshold monitoring isn’t the best way to do capacity planning. There’s too much room for error—and too many factors that aren’t accounted for.
Queuing network theory is a better way to do capacity planning. Also known as waiting line theory, queuing theory takes queue length and service time into account.
Say you go to a bank there’s a queue of ten and a queue of three. Your first instinct might be to join the queue of three.
But what if there’s a guy who has a bag full of pennies in the queue of three—and all of those pennies need to be counted? And what if the people in the queue of ten are just making simple deposits? Then the queue of ten will move a lot faster.
Let’s take that analogy a little further. In real life, there normally isn’t a queue for penny counters and a queue for deposits. In real life, the profile of each queue constantly changes. Any prediction of how long you will spend in the queue needs to take this constant change into account.
By monitoring thresholds, you don’t get the insight. You’ll only know utilization. With queuing theory, you get the full picture. You’ll know utilization and the impact of utilization, so you can make the right decisions for your organization.