A Forrester Consulting Thought Leadership Paper Commissioned by TeamQuest (now part of HelpSystems)
IT is an integral part of the business fabric and is fast reaching a utility status in the enterprise. Increasingly lower hardware costs have opened the door to a multitude of applications and have mistakenly made some IT processes, such as the capacity management of infrastructure resources, seem obsolete. This study will show that capacity management and predictive analysis are not just about hardware costs, but about service quality and the optimization of personnel resources, a budget post that is actually increasing in all IT organizations.
The key IT management drivers now revolve around quality of service and cost control. Improving the business workforce productivity through quality of service and improving IT efficiency by maintaining service quality while reducing costs are the main directions that are mentioned by the IT decision-makers surveyed in this report.
A majority see consolidation of infrastructures using virtualization as a way to reduce IT infrastructure and energy costs. Better reactive and predictive approaches to service performance issues are seen as the best ways to maintain service quality.
However, these two worlds collide; virtualization is often times affecting service performances in a way that is difficult to resolve in production, as many monitoring tools do not provide enough visibility into the application container. A majority of IT decision-makers surveyed consider that predictive analysis of the application workloads that are candidates for virtualization, as well as the predictive sizing of the physical infrastructure that will support the virtual elements, are the most effective ways to avoid problems in production.
In fact, the results of this study show that, given the difficulties of resolving issues in production and the potential impact on the business workforce, capacity management processes and tools are the best ways to control both service levels and costs at the IT level and reduce the productivity and financial impact of problems at the business level.
The role of IT services in the enterprise has considerably changed over the years. The ever-decreasing cost of hardware and the increasing productivity in software development have made possible a myriad of applications and business services that have quickly become part of the business fabric of the enterprise.
This omnipresence of IT in the business activities has a number of consequences, one of which is the evolution of the role of IT from a business support organization to a business service provider.
This role change has several implications, among which the more significant are: 1) the accent on the quality of services provided to the business groups, and 2) the cost of these services, which are now an integral part of the business bottom line.
To illustrate these changes and to better understand the new constraints that they represent, TeamQuest Corporation commissioned Forrester Consulting to conduct a study of 226 IT decisionmakers involved in IT operations, IT service support, and IT infrastructure. Forty-four percent of these decision-makers are in the US, while the remaining 56% were equally divided across the UK, Scandinavia, Germany, and France. (More on the methodology can be found in Appendix A.) The business expectations of IT services is clearly stated by a majority of these decision-makers as improving the productivity of the business workforce, lowering the company operational costs, improving the quality of business processes, and acquiring or retaining customers (see Figure 1).
But this cannot happen at any price: a majority of the IT decision-makers surveyed clearly state that their goals in 2009 will be to improve IT efficiency and improve their development and IT management processes (see Figure 2).
Defining IT Efficiency
IT efficiency is defined differently from one IT organization to the next. In a majority of the cases, IT organizations consider that it fundamentally means “doing more with less”: maintaining or improving the quality of IT services while constantly reducing the cost of these services (see Figure 3).
These definitions of efficiency have to be translated into pure IT management terms. In the Forrester report “The State Of Enterprise IT Budgets: 2008,” we find that while hardware and software investments are roughly equivalent at about 20%, personnel and external services expenses reach a staggering 47% of the typical budget. The major reasons for these ever-increasing
expenses are found in the diversity and complexity of the current IT infrastructures and applications. Diversity of the infrastructure means that IT needs to acquire the skills to administer and manage heterogeneous platforms. Complexity of applications means that the resolution of incidents and performance lapses requires more and more involvement of these specialists to reach a satisfactory solution. The standardization of infrastructures through consolidation and the management processes used to prevent or correct service performance problems are the main areas on which IT organizations will concentrate to improve efficiency (see Figure 4).
Forrester’s study uncovered two complementary directions for improving IT efficiency. One consists of standardizing the infrastructure to reduce costs such as administration, power, and cooling; another parallel one is to improve the overall quality of IT services.
Improving Efficiency Through Infrastructure Consolidation
The proliferation of servers, fueled by the ever-decreasing price of hardware, is seen now as an additional burden on IT personnel. This server multiplication has often resulted in a selection of what was perceived as the best choice for the type of application that these servers had to support, resulting in a diversity of operating systems, interfaces, and administration requirements. As a consequence, many IT executives regard the standardization of IT infrastructures and the consolidation of servers into fewer, larger machines as a means to improve IT efficiency (see Figure 5).
In this context, virtualization is seen as a means to achieve both standardization and consolidation at the same time; by using either proprietary partitions or by using a Hypervisor on an Intel server, multiple applications and their specific operating system versions can be consolidated without changes on a single machine running a compatible instruction set. While virtualization initially affected file and print servers or noncritical infrastructures, we now find that mission-critical applications are increasingly consolidated on virtual servers, with only a few respondents (19%) not using virtualization for consolidation (see Figure 6).
Consolidation is also a key factor in power and cooling savings. The multiplication of servers (very often energy-wasteful) as well as the fluctuations of energy prices has brought power and cooling to the attention of many IT organizations. An added benefit of consolidation is the opportunity to replace the energy-wasteful servers of 2006 by a reduced number of more energy-efficient servers: such a consolidation will kill two birds with one stone, reducing personnel demands and reducing the data center energy bill and carbon footprint. Only 25% of the respondents said power and cooling were not critical factors in their data centers (see Figure 7).
Overall, 77% of the IT organizations surveyed agreed that server consolidation was a significant or very significant factor in improving IT efficiency (see Figure 8). The clear elements of this improvement was in the reduction of overall hardware, software, and personnel costs for 68% of the IT organizations, and in power and cooling savings for 58% of the organizations surveyed.
Improving Business Workforce Productivity
The first element of improving overall IT efficiency is centered on cost reduction through infrastructure improvements, leading to a better use of energy and personnel resources. The second element of efficiency improvement is in maintaining or improving the quality of IT services supporting the business users. This is mostly centered on application performance and availability, themselves a direct consequence of the right sizing of the infrastructure supporting the mission critical applications, especially when these applications operate in a consolidated environment (see Figure 9).
IT organizations, however, are often placed in a position to correct problems after rather than before the fact. In almost half of the cases, both strategies are used: in 56% of the cases, reactive processes are improved, and preventive processes are applied in 48% of the cases throughout the application life cycle (see Figure 10).
Issues With Improvement Strategies
The two efficiency improvement strategies may eventually collide with each other. Consolidation, as a way to save money on administration and energy, often means that the quality of service suffers from a lack of predictive analysis. Virtualized applications are effectively more difficult to monitor for performance, and consolidating several applications on a single server may lead to problems that were associated with the centralized IT of the past, such as contention for physical server resources or collision between applications’ peak workloads.
Managing Performances In A Virtual Environment
Virtual environments are effectively more opaque to problem identification. Most of the issues are coming from the application containers losing the direct visibility of the clock interrupt that governs many data collection agents. In this environment, predictive analysis is the key process that leads to success. In the Forrester study, only 16% of the respondents said they never experienced performance issues with virtualization (see Figure 11).
Other respondents cited difficulties with problem resolution, initial server sizing (which was already identified as a key element of server consolidation), difficulties in identifying applications that could be virtualized, and how these application workloads peaked and created physical resource contentions. In short, the main difficulties encountered by a majority of the respondents were centered on preparation and predictive analysis of the virtualized environment: the predictive modeling of the virtualized infrastructure that could point out the potential conflicts between virtualized applications as well as the lack of understanding of the application behavior created by the difficulty of assessing application performance and application workloads before the fact (see Figure 12).
The Difficulties Of Resolving Issues In Production
These difficulties of resolving performance issues in production have a reverse impact on cost and create a ripple effect by which consolidation, as a means to reduce IT costs, has in fact the effect of simply displacing the personnel costs from server administration to service quality resolution. In fact, while many performance issues can be resolved within 24 hours for 64% of respondents, some issues may take a longer time, and in 36% of the time, IT organizations cannot resolve up to threequarters of their production performance issues in less than 24 hours (see Figure 13).
Most IT organizations, faced with a performance issue in production, resort to a team of engineers to resolve the issue. This adds a negative effect on IT efficiency on top of the negative effect on IT service quality: 50% of IT organizations will use a team of two to five people to resolve issues, but 36% will allocate six to more than 10 team members on these problems (see Figure 14). In many cases, these will be the issues that take more than 24 hours to correct, leading to a relatively important loss of IT productivity and increased costs.
IT Costs Are Not The Only Consequence
Of course, a performance problem in production has a direct impact on the end users and on the business workforce productivity. While the pure IT costs can be easily determined by looking at the number of persons working on the problem and the time they spend resolving it, the impact on the business is almost never considered and is, in many cases, far more important than the impact on IT costs. In the following cost model, we contrast the IT costs of finding and resolving an issue before production with the total costs of the same issue in production when the problem does impact the business productivity and the overall revenue. This is clearly a case for preventing problems rather than correcting them; the debate between reducing the time to repair issues and reducing the number of problems is clearly in favor of the latter (see Figure 15).
This cost model, first introduced in the Forrester report “Performance Management And The Application Life Cycle,” presents a strong case for using predictive analysis rather than a reactive process for performance problem resolution. Many examples have been used to effectively compare both approaches. It is generally accepted that the cost difference could be as high as one or two orders of magnitude: that is, $1 spent in prevention avoids spending from $10 to $100 in correction. One thousand employees having to rework because of IT failures would cost about $60,000 in overtime while 10 IT persons working on the problem for 1 hour would cost about $600 to the company, and this does not even factor in the nonrecoverable revenue for the business.
Predictive Analysis Is The Strategic Key
Predictive analysis in the form of a well-thought capacity management process, inspired by IT infrastructure library (ITIL), for example, and supported by adequate tools, is the answer to the apparent conundrum posed by consolidation, virtualization, and IT service quality. When asked about the benefits of such a process, IT organizations that implemented it actually hit on all the key strategic points that we raised in this study (see Figure 16).
Effectively, this shows that a capacity management process will: 1) increase user satisfaction and consequently improve business workforce productivity; 2) improve IT efficiency and reduce the cost of fighting performance issues; and 3) reduce the business impact and costs by increasing the time between failures and consequently reduce the business impact of outages.
Respondents also believe that this process can be effectively applied to several steps in the application life cycle, and that it does not, as capacity planning was seen to do in the past, essentially address only infrastructure silos in production. The capacity management process is the recipient of all holistic predictive analysis and is widely regarded as the best strategy to reach the strategic objective of doing more with less (see Figure 17).
When specifically asked about the benefits of capacity management, a majority of respondents believe that this is the best way to reach the IT efficiency objective (74%), improve workforce productivity (53%), succeed in using virtualization and consolidation (64%), and improve power and cooling issues (48%) (see Figure 18). This clearly provides a solution to the issues resulting from consolidation, virtualization, and performance management.
Even IT organizations that have not implemented a capacity management process see that it would provide the strategic benefits that IT organizations are seeking: improved IT efficiency, increased user satisfaction, cost reduction through less firefights and a reduction of problem frequency, and a reduced business impact of performance problems due to outage reduction. Only 14% of those “skeptics” did not see any benefits in implementing a capacity management process (see Figure 19).
This study shows that a capacity management process supported by the right tool set is the key to IT efficiency and service quality.
As IT technology evolves, we see that the main drivers are economic ones: service quality, because it impacts the end user productivity and consequently the revenue and costs of doing business, and IT efficiency, because doing more with less is going to reflect the condition of IT budgets in the years to come.
Overwhelmingly, the IT executives who took part in this study, over several countries and continents, agree that the best way to reach these objectives of quality and efficiency is to use prevention rather than correction. Even those participants who have not considered implementing a predictive analysis process such as capacity management agree that such a process would in fact be the key to reaching these key objectives.
We also find that predictive analysis effectively applies to the complete IT life cycle, from development to production through quality assurance and testing.
Virtualization — seen by a majority of participants as the best way to control infrastructure, personnel, and energy costs — sometimes collides with the quest for service quality. Again, a majority of IT executives agree that capacity management is the best way to prevent these issues from happening.
The consequences of entering server consolidation and virtualization without prior knowledge of application workloads and resource usage will translate into downtime. This will cost IT time to resolve the problem, but more importantly, this will also affect the business workforce. The cost of lost productivity will have to be compensated and recovered somehow, adding potentially to the total impact. While reducing the time to correct the problem is a necessary enterprise, preventing the problem from happening is a far better economic proposition. An ounce of prevention is worth a pound of cure.
Capacity management should no longer be considered as a way to optimize hardware resources, but as a way to improve IT efficiency: maintaining the level of service while optimizing the costs in administration, energy, and problem resolution. Capacity management should be implemented as a complete predictive analysis process that includes:
- The predictive analysis of application in production to determine performance drift and predict the need for more resources.
- The analysis of application workloads, the resource consumption of these applications, and the way they are consumed over time to prepare for the virtualization and consolidation of
- these applications.
- The analysis of applications in preproduction as they are going through testing and quality assurance.
Appendix A: Methodology And Respondent
In November 2008, Forrester Consulting conducted an online survey of 226 global IT decisionmakers
to understand how companies currently view their IT organization, including IT initiatives and processes. Specifically, we looked to gain insight into how companies are working to increase their IT efficiency, and understand companies’ current use and pain points around server consolidation and virtualization. We also looked at how organizations currently resolve performance issues, and looked to highlight their opinions around the benefits and barriers to capacity planning.
In this survey:
- Forty-four percent of respondents were headquartered in the US. The other 56% were equally distributed across Germany, France, the UK, and Scandanavia.
- With respect to company size, 37% came from small companies with less than 100 employees; 29% came from medium-size companies with 100 to 999 employees; 23% came from large companies with 1,000 to 4,999 employees; 7% came from very large companies of 5,000 to 19,999 employees; and 5% came from global companies of more than 20,000 employees.
- Respondents came from a variety of industries. Special attention was paid to business/professional services, financial services, and insurance.
All respondents came from IT positions; 42% were the senior-most IT decision-makers in the company, 40% were executives within IT, and 18% were managers or directors within IT reporting to an executive in IT. Respondents held a variety of titles, such as CIO, CTO, VP/director/manager of operations, director/VP of application development, or IT architect.