Jeff Fox is the Manager of Network Infrastructure Services at the Law School Admission Council (LSAC). He implemented Vityl Capacity Management to address performance constraints that caused bottlenecks during peak traffic periods. Modeling revealed the exact cause of the bottlenecks as well as the exact hardware required to maintain informal Service Level Agreements (SLA). Vityl Capacity Management’s capacity planning component is also being relied upon as LSAC rolls out a new system to more than 200 member law schools.
The LSAC network consists of two T3 connections and about 100 Windows servers. There are some UNIX servers as well. Load balancers are used to build redundancy into the network. If one server is busy or goes down, the traffic is directed elsewhere, without any impact on service.
LSAC’s customers include law school candidates, law school admission offices, pre-law advisors or undergraduate law schools, and internal departments at LSAC.
This network is under the most pressure right before law school application deadline dates, with February 1 and 15 as the peak days each year. During the peak in 2007, LSAC was unable to maintain expected levels of service. The huge peaks in volume caused systems to slow down. The organization determined that it would reorganize its infrastructure to ensure that it never again failed to deliver during peak periods.
New Application Roll-Out
For the purposes of this paper, we will focus on the ACES2 application, which is one of many applications that have been developed in-house at LSAC and which serve our external customers. This software supports the law school application review and admission process. It sits on a Windows 2003/SQL server platform and was built using Windows .NET technology.
Our goals in using Vityl Capacity Management to analyze ACES2 were to predict the resources required to meet service levels as demand increases, identify which components would negatively impact response time, and find the least expensive way to accommodate workload growth. We had been firmly in control of how performance would be affected as we rolled out the new application to law schools. In order to meet our SLAs, it was vital for us to maintain stipulated performance levels as we added more schools over time.
This phased implementation of the new application would have most schools accessing our systems across the Internet. Those schools with bandwidth issues, however, would be provided with an appliance to facilitate a desirable level of performance. The initiative began with a beta program at 32 schools.
ACES2 performs multiple functions. It is used to upload scanned documents that support the law school application process, such as transcripts and letters of recommendation, and transmits electronic applications. Data contained in the law school applications is processed by the law schools themselves.
The current ACES2 architecture consists of:
- Two front-end Windows-based Web servers. They have one load balancer operating between them. If one server fails or is fully utilized, traffic is transferred to the other machine. The load balancer also optimizes traffic distribution in order to heighten performance.
- Two back-end Windows-based Web servers that also share a load balancer. These systems access our SQL server and Oracle databases.
- Two SQL servers that are set up as a cluster. One operates as the active member of the cluster. Note: we are able to add further clusters as needed to satisfy performance demands.
- All servers use Intel Xeon dual-processor single-core chips with hyper-threading. These run at either 3.0 GHz or 3.8 GHz. Before ACES2 was launched, we performed baseline tests using Vityl Capacity Management’s capacity planning component. We found that the four Web servers would have plenty of processor time; i.e., the CPU usage stayed under 10 percent. We then used Vityl Capacity Management to run some tests on memory performance. The average memory paging per/sec on the four Web servers was found to be satisfactory except for a series of spikes. By drilling down into Vityl Capacity Management, we realized the spikes all occurred while backups were being conducted on weekends. This was not a concern as, despite the spikes, all systems had plenty of memory available.
Vityl Capacity Management was also used to measure total CPU time for our SQL servers. Here again, usage remained under 10 percent. We noted one significant peak. Investigation revealed that was a day when a new law school was uploaded. Overall, average request time for all servers stayed well under one second. We were well satisfied with that level of response time for our user community. Our modeling strategy was to use a timeframe that was representative of our peak periods, capture as much data as possible early in the roll-out process, and model various growth scenarios to ensure we were prepared for all eventualities. We also set roll-out checkpoints to determine if there were any deviations from our plan and, if necessary, refine the plan based on what we discovered at the checkpoints.
Modeling during our beta period revealed that our anticipated load growth levels were quite accurate. With 32 schools launched and all systems performing well, we had enough confidence to begin the full roll-out to our schools at a rate of 12 per month. When we added the first 12 schools, Vityl highlighted the fact that we had no noticeable difference in response. As we continued to roll out schools at the 12-per-month rate, we used Vityl to verify the speed of response. In the event of an anomaly, Vityl could be used to drill down into the data to locate the cause. This enables IT to take timely action to maintain our SLAs.
We debated whether to take into account the size of the schools and other factors that might affect the amount of bandwidth and other systems required. In the end, we decided that this would balance out over time and that we didn’t require that level of complexity to develop an accurate model.
Projecting into the future at a rate of 12 schools added per month, we conducted detailed modeling. By the time we reached 136 schools, Vityl showed that our response time might deteriorate to as much as 2.5 seconds under certain conditions. Vityl Capacity Management drill downs isolated a queue delay problem in our SQL server. This traced back to a disk I/O bottleneck, which we addressed satisfactorily.
Modeling demonstrated conclusively that a phased roll-out to our schools could be supported on our existing hardware platform. During this implementation, our response times would not vary substantially with expected growth. The conclusions of Vityl Capacity Management’s capacity planning component have been confirmed in the real world.
From here, we will continue to gather performance data and associated business metrics, monitor the progress of roll-outs, and take action on any deviations from our projections. For example, we expect to find an occasional anomaly due to the influence of one or two large schools coming onto the system.
Vityl software has been of immense help in understanding our existing environment and in generating what-if scenarios to highlight our performance capabilities under a steadily increasing load. We will continue to use Vityl going forward while developing and refining our capacity plans and adjusting our infrastructure accordingly.
Finally, Vityl has facilitated a much stronger bond between operations and development. Instead of arguing over whether we have an application or infrastructure issue, we now can predict with certainty, isolate root causes, and proactively address potential problems.
See for yourself how Vityl Capacity Management can help your organization avoid downtime, mitigate risk, and keep costs down.