Guide

A Guide to IT Operations Maturity

Learn more about how to do capacity planning

Businesses are constantly putting pressure on price, performance, and reliability of IT services. Introduction of public cloud services, new technologies, and updated management paradigms have all raised the expectations. In response to these increased requirements most modern management frameworks rely on advanced analytics, machine learning, and automation.

But to truly optimize your business services, it’s not enough to focus on just software and technology. To get full effect from your efforts you need to combine them with gradually refined processes and skills within your organization. Our IT Operations Maturity Model offers a step by step approach that ensures a balanced approach where each maturity level feeds off the capabilities introduced on the levels below.

IT Operations Maturity Model | HelpSystems

Level 1 – Chaotic

This level of maturity is defined by the lack of a coherent strategy for managing and improving the quality of your IT services. Any efforts to improve service quality is usually triggered by incidents, brought to your attention by users or customers complaining. Resolving the issue often becomes a personal task assigned to individuals with the proper skills. However, relying on heroic contributions from individuals will likely not lead to the consistent, sustained improvements the business is looking for.

A requirement for being able to improve service quality is access to empirical monitoring data about how the different components and layers of the infrastructure are being used, now and in the past. With later years’ trends like cloud native applications and containerization, the number of components and their interdependencies has increased dramatically. If you only have access to data for a subset of those components and a limited understanding of how they fit together, any optimization effort becomes a complete hit and miss exercise.

Improved services require a conscious effort. Operating at this level you are bound to remain a victim to unknown circumstances and unexpected events until you make an effort to increase your maturity.

Level 2 – Reactive

Moving to the Reactive level is primarily about improving your ability to react to events in your environment in a consistent and timely fashion, but also about ensuring that you have enough information to allow for proper handling of those events.

How do I get to this level?
The first step is to make sure you have access to detailed information about the behavior of the components underpinning your services. This means that you will probably have to federate multiple different monitoring data sources, each covering a certain platform, cloud service, or abstraction layer in the system software stack.

By sending alerts or notifications based on predefined thresholds, you can improve timely identification and quick mitigation of incidents. A systematic analysis of incoming data, expanded to include an increased number of components and data sources, will also put focus on the need to automate most of the activities. Automation of the analytical capabilities will help you manage the large scope and will lead to more consistent and repeatable procedures. It is also a requirement for keeping up with the volume and pace at which data is being produced.

What does this mean to the business?
Ongoing systematic analysis of incoming data will allow you to react faster to performance and capacity related incidents in your operation. This increased focus on faster problem detection will have a positive impact on the quality of your services and will contribute to increased availability.

You will also get a better understanding of how the existing resources are being utilized at an individual level, allowing you to identify obvious cases of under- or over-provisioning.

What can still be improved?
Having access to real-time, component-level data will only take you so far. The lack of history prevents you from successfully predicting recurring incidents, causing you to spend most of your time reacting to things rather than trying to identify and eliminate the root cause. You will still be dealing with the same amount of incidents, only at a higher speed.

Another limitation is the strict focus on technical components. Without context telling you what services are currently making use of a component, it’s hard to properly prioritize incidents in order of importance to the business. Overall service quality is really what you should focus on; limiting yourself to a component view will lead to misdirected efforts.

Level 3 – Proactive

You have already developed an ability to quickly react to events. Taking the next step involves discovering patterns among those events and learning from the empirical data you have available. When you master that, you will be able to proactively mitigate some risks before they have a negative impact on your services. It also has the potential of raising overall efficiency, but more about that later.

How do I get to this level?
By aggregating and keeping selected portions of the data you are collecting, you can gradually engage in more advanced analysis of it. You can establish the root cause of incidents, find recurring patterns, or spot historic trends. All this allows you to become more proactive and hopefully identify and prevent some incidents before they happen.

As you start using more advanced analytical methods, you will also discover the need to improve the integration of different data sources. The ability to automatically analyze and correlate data across technical or organizational silos will allow you to find new and unexpected dependencies. In addition to just analyzing data pertaining to services and technical infrastructure components, you should expand the effort to include additional sources that will help your optimization efforts. Configuration data, service level compliance, business transaction volumes, facilities data (power, cooling, floor space, etc.), asset and costing data, incident data—any and all of these have the potential to improve your analytical efforts. The term Integration does not necessarily mean physical consolidation of the data, which leads to duplication and delays. Using data federation, you can make multiple sources appear as a single source without the extra consolidation step.

Once you have integrated the different data sources, it may be tempting to unleash your analytical capabilities across the whole environment. But in reality, there is probably a set of comparisons that are more relevant than others. A Configuration Management Database or Service Catalog can provide useful insight here. Focusing on things like “Which components are used by service X?” or “What are the likely downstream effects of component Y getting saturated?” will allow you to limit the scope, focus your efforts, and get quicker results. It will also allow you to prioritize actions based on the importance of the service impacted and to communicate the results of your effort in terms that make sense to your customers.

What does this mean to the business?

Adopting a more proactive optimization strategy will lead to improved efficiency. A better understanding of priorities, component dependencies, and forecasted future behavior based on historical trends will build your confidence and allow you to increase the utilization of individual components. By sweating your assets a bit more, you can defer or avoid investments in new capacity, resulting in overall improved efficiency.

More advanced analytics intelligently applied to your collective data will also increase your ability to proactively avoid incidents. This will improve the quality of your services even further.

What can still be improved?
Operating at the Proactive level means that you will have gained some ability to forecast and react to events that will happen in the near future. But since these predictions are based on extrapolation of past behavior, they may lack in general applicability. Scenarios with a longer planning horizon or scenarios involving non-linear growth caused by outside factors not represented in the historical data are still a major challenge to accurate predictions.

Another weakness is that most of the analysis is based on comparisons against predefined thresholds. Unless you use thresholds that are specifically tailored for each individual component or service (which is virtually impossible in an environment of any reasonable size), the conclusions may not be correct and are causing noise that distracts you from the real problems.

Level 4 – Service

So far, we have discussed planning and proactive measures based on extrapolation of trends in historic data. But there are more powerful and accurate ways of making predictions, especially if the scenarios involve nonlinear changes or simultaneous changes to multiple components. By employing more sophisticated analytics, you can accurately predict the outcome of such scenarios and offer powerful service quality indicators.

How do I get to this level?
By thorough analysis of empirical data, combined with the use of sophisticated mathematical algorithms, you can make accurate predictions to further increase your efficiency and better manage the quality of your services. By building capacity models, you can assess different “what if” scenarios, accurately predict their impact, and answer questions like, “What will happen to the response time of this service if we increase the number of business transaction by 15%?”.    

In building those models, you need to engage with stakeholders in various parts of the business to define scenarios that are relevant and realistic. They should also be involved in providing the demand forecasts to base the plan on. These relationships outside of IT are crucial and will greatly improve efficiency and quality of the models if done correctly.

But in some cases, this approach may not be possible for various reasons. You may not have access to the people that can define the relevant scenarios. Or your services may be built on a highly dynamic infrastructure, leading to constantly shifting configurations that makes assessments based on a single snapshot of data irrelevant. Or maybe the sheer size of your operation in relation to your staffing resources makes it impossible to accomplish. In those cases, you can still make use of the predictive analytics described earlier. Using the available data about your services, you can identify average historical growth rates. Extrapolating that growth profile into the future will give you a best-guess scenario to evaluate per service. Applying this across all your services will give you a thorough assessment of service health and the possibility to predict what services that could end up in a state of “non-compliance” based on the evaluated scenario. This macro-level approach, where you focus on the exceptions, will allow you to extend the reach of your optimization efforts and cover a fairly large scope with a reasonable effort. Starting out with a wide and automated standard assessment, you can then drill down into services that stand out.

What does this mean to the business?
A focus on making more accurate predictions, either based on manually crafted scenarios covering individual services or in the shape of a wide assessment aimed at identifying outliers, will greatly improve your ability to plan ahead. A good understanding of when a service will need additional resources will allow you to make just-in-time investments and stay closer to the true capacity need.

What can still be improved?
By using predictive analytics you are now able to answer a question like, “How will this scenario impact my ability to continue to support the business?”. If the conclusion is that your ability is impacted, the analysis will likely give some hints as to what actions are needed to maintain the desired service level. The next level of maturity is about developing a capability to automatically rank and prescribe actions in response to predicted events.

Level 5 – Value

To reach this maturity level, you should leverage the decision support produced at the lower levels into an ability to suggest decision options on how to take advantage of a future opportunity or mitigate future risk. The suggested actions should, of course, be aligned with your overall business objectives.

How do I get to this level?

This level is characterized by an increased focus on business alignment. Along with the improved maturity, you will gradually move away from the role of being just the expert that solved intricate technical problems and will now be ready to take the final steps towards offering comprehensive decision support that is aligned with the business objectives of your organization.

When multiple options are identified, they should all be presented back to the business for them to make the decision. The technical recommendations should be supplemented with the expected financial impact. This way, the changed cost of a service can be weighed against the business benefits it brings. Using that relationship as input to the optimization efforts will help business stakeholders to prioritize actions better.

These new insights will also give you a better indication of areas where you should try to impact the demand for a service, as opposed to blindly satisfying it. Rather than strictly focusing on the supply of capacity, you can look for ways to optimize and influence how resources are being consumed.

A key to achieving this tighter alignment between the capacity management process and the business services it supports, is to provide self-service tools and procedures. Getting stakeholders from the business services involved in the planning hands-on and validating results will yield the very best results. This implies that processes and interfaces are simplified to satisfy the needs of less experienced and frequent users and that the results are provided with a business context.

What does this mean to the business?
You can now offer the business answers to open questions like “How can we do this better?”. By analyzing a set of scenarios, you can then propose actions that will lead to further optimization. Getting the business stakeholders to actively participate will make them more accountable and aware of the importance of planning. Services will be optimized to offer maximum value to the business.  

What can still be improved?
Even though the highest level of IT operations maturity has been reached, there is a continuous need to analyze and realign your processes to changes in the business environment. You should also make it a habit to continuously validate actual versus forecasted behavior of your services to identify systematic errors or bias.

Summary

Capacity management in IT operations is mostly associated with increased efficiency and reduced risk, two forces that will drive valuable improvements. But without a full understanding of the business objectives and how IT needs to be aligned with those, full optimization is not possible. This IT Operations Maturity Model will allow you to gradually improve without losing track of any important aspect.

Ready to see how Vityl Capacity Management can help improve your IT organization's maturity level?
Get your free personalized demo today. We'll walk you through the software's capabilities, showing you what Vityl can do for your organization.

Related Products

Related Solutions