You’ve started migrating applications to the cloud. Now what?
Until now, you’ve properly planned and provisioned your on-premises environment. Applications were running smoothly and the environment was optimized. Now you’re trying to predict how applications will behave in the public cloud. And costs seem to be higher than expected.
How do you keep services up and running and costs contained in the public cloud?
Watch and learn:
- Why cloud deployments are more expensive than expected
- What is contributing to the high cost of cloud
- How to gain control and optimize your cloud deployment
Amanda Hendley: 00:01 Thank you for joining us today for our How to Control Your Crowd Deployment, hosted by TeamQuest. Before we get started I want to make sure that you know that you can submit questions using the questions tab through the GoToWebinar side panel. We will be collecting those questions throughout today's session. Our question and answer session will be at the end of the webinar and if we do not get to any of your questions, we will be posting all the questions and answers in CNGs blog.
Amanda Hendley: 00:33 Now I'm pleased to introduce you to Per. As a Director of International Solution Services, Per is responsible for the delivery of TeamQuest services to customers worldwide. He conceived and is the author of the TeamQuest Capacity Management Maturity model and has presented papers on the subject in Europe and in the United States. Per has a combination of MDs, practical experience with the understanding of the business drivers for capacity management contributing to his role as a thought leader in the industry. Per also sits on TeamQuest's Product Management Steering Committee giving his insight of real world needs and translating that into deliverable products for the market.
Amanda Hendley: 01:20 Welcome Per.
Per Bauer: 01:21 Thank you very much. Today's session is going to be about how to deploy ... you have the existing applications or new applications into client Cloud environment and how you do that in the best way from a capacity perspective.
Per Bauer: 01:38 We had a session, so this is me by the way for you who haven't met me. I'm going to skip through that one. We had a session some months ago where we had a general primer around capacity management for the Cloud and what that means for best management in this new context. This presentation will focus on the first step towards Cloud existence so the transition and migration over to the Cloud of work groups and applications. I will share some best practices and experience that I've drawn from customers that we've helped and assisted with that transition throughout this presentation. I'll also present some findings, some recommendations around things that dos and don'ts in this space.
Per Bauer: 02:35 But those who have listened to me before know that I always want to start off by making sure that we have a common frame of reference when we talk about this. I'll do that first.
Per Bauer: 02:45 Some common terminology. In the Cloud space there are a number of different entities that we need to consider and take into our planning. The first one is traditional IT, that's the stuff that we've always been doing. Physical or virtual infrastructure running in our data center or in a co-location for that part, but it's basically resources and infrastructure that we own and operate and we are responsible for.
Per Bauer: 03:16 The next one is the Private cloud which is somewhere in between. Basically our own outsourced infrastructure that is wrapped into Cloud service. You get the features of self provisioning and elasticity, et cetera, that comes with Cloud, but underneath there's infrastructure that you still need to operate and optimize and provision.
Per Bauer: 03:41 Then the third one is the Public Cloud where you completely hand things over to a Public provider to take care of the infrastructure for you. You subscribe to a service which gives you, in our case where we talk about infrastructure as a service, instances of computer or storage that you can use for your applications and websites and it's a responsibility of the top provider to optimize and take care of the infrastructure underneath.
Per Bauer: 04:12 There's been a lot of talk about, over the last four or five years I would argue, about hybrid Cloud where you lump together Private and Public Cloud and see them as two entities that could interchangeably be used to host work loads. I think more and more overtime, especially over the last couple of years, the term hybrid Cloud has been used to describe something else actually because more and more companies are starting to realize that Private and Public Cloud are kind of hard to mix. It's hard to move things between them. I think the whole Private Cloud market has cooled off considerably, they haven't been able to keep up with the level of innovation and the level of service that you get from the big Public Cloud providers. There is no future parity between the two anymore and the terminology hybrid Cloud where you could potentially move things back and forth between a Private and a Public Cloud is more and more going away.
Per Bauer: 05:17 This has been replaced by a term called Multi Cloud where you, as a company or as an organization, decide to use multiple different Public Cloud vendors for different purposes or to spread your risk across those two or use them for what they're best for. A certain Cloud provider may have a better offering in it in run space which comes to compute heavy work loads versus another who comes with a full stack for say database or data driven applications. Different types of scenarios could, you could use different Cloud providers. It can also be a way of spreading your risk as we said so not placing all your eggs in the same basket, avoiding being locked in with one vendor to keep them honest and have a better situation when you want to negotiate terms and conditions around your subscriptions, et cetera, et cetera. Multi Cloud is getting more and more traction and more and more attention and it is for large organizations the default today more or less.
Per Bauer: 06:28 Then we have the Hybrid IT which is what as capacity managers are faced with. We need to manage everything. We need the manage the legacy of the traditional IT services that you have and plan for those and optimize those and make sure that they are available and form the scales as necessary. We have the new stuff running in the Cloud, may very well be part of the same business service or support the same business service so we need to cover everything that is running in the Cloud or on prem in same view or be able to optimize them in the same fashion. Hybrid IT is really what we should focus on so we need to have solutions that can reiterate both views and understand the differences between running things in the Cloud versus traditional IT and not having silos, whether we use silo of information for the Cloud that doesn't mix with the traditional IT because the reality is that most work loads will touch both sides of this or business services will touch both sides. This is where we are in terms of the different entities that we're dealing with.
Per Bauer: 07:44 Over the last two or three years there's been more or less an Exodus to Public Cloud. There's been a huge growth in the adoption of Public Cloud, it's growing at break neck speed. Self [inaudible 00:08:00] service is still the biggest technology in terms of revenue so running applications on tap in the Public Cloud like sales force or Office 365, et cetera. That's the biggest chunk, but the infrastructure as a service which we typically associate with AWS or Azure, et cetera and the stuff that requires some capacity planning and capacity management. That section of the market has also been growing very quickly and it's supposed to triple over the four coming years according to this prediction by Gartner. We haven't seen the peak yet, it's continuing to grow at very high rates.
Per Bauer: 08:44 As far as organizations goes, it's also been a shift in mindset. What started out as a No-Cloud strategy for a lot of big enterprises, they were concerned about the security and the scenario of having applications leave their premise and be run somewhere outside in the public environment. They first switched to this Cloud-First approach where there was a policy saying that for each new work load or each new service that we're bringing on board we should always evaluate a Cloud alternative first and if it's found not to be the best fit you could go on prem, but Cloud should always be the first alternative. The later year or in the later months I would say also Cloud-Only strategy has been, we start to see that more and more. Companies do not want to bring on new services, new work trips, new technologies that requires any type of on prem infrastructure so everything should go into the Cloud. This is obviously not the case for every company or organization, but we're starting to see that more and more and it's actually something you can fund in a lot of recessions.
Per Bauer: 10:06 Why are organizations adoption Infrastructure as a Service or Cloud as we ... we're going to talk about Cloud in this presentation, we'll primarily referred to as Infrastructure as a Service because that's really where capacity and planning for that capacity makes sense. It's important to understand what the drivers are because that dictates how we need to act and what we need to think about when we do our jobs.
Per Bauer: 10:30 The first one is cost reduction. Everyone knows this, understand the right circumstances uses resources that are referred through Cloud service can save a lot of money. The pay as you go nature of it without any upfront capital investment or any continued unauthorization of once you stopped using it combined with the less [inaudible 00:10:55] applications designed for the Cloud makes it a very favorable alternative. You can save money doing that, but you can also save money not having to pay for facilities, data centers, and stuff, et cetera, et cetera. Cost reduction has been the primary driver so far for Cloud, but it's very closely followed by agility.
Per Bauer: 11:15 The ability to allocate new resources with very short notice is by far the second most compelling reason for adopting Infrastructure as a Service. Especially in large enterprise environments over the years, there's been a very long provisioning lead times involved because of the complexity and all the people involved and all the red tape involved. The business users or the consumers of IT services has grown aware of this and they would really welcome visibility to quickly provision new infrastructure and also have a self-service element to it. That's a big opportunity or a big driver for Cloud.
Per Bauer: 12:01 The third one is the strategic focus. Relying on external experts to maintain their infrastructure allows your internal IT resources to focus more on strategic initiatives like building the right type of applications or being innovative when it comes to designing those business services. Not focusing on the running of the data center, but focusing on what's running in the data center rather. Shifting the focus from brick and mortar over to more business value adding activities in the IT space.
Per Bauer: 12:37 The fourth one, assurance. Especially for smaller organizations relying on someone like AWS or Azure or another giant company which their core business is operating IT, that feels much better than having to do it on your own. Smaller organizations or medium sized organizations can't necessarily afford all the expertise required so handing over some of their responsibility and making sure that you get low cost operational discipline around your IT is a very competitive reason.
Per Bauer: 13:15 The fifth one being scalability which is similar to the agility aspect. This perceived infinite scalability of Cloud where you can scale up as you need go hand in hand with demand. It's not, it doesn't happen automatically, but it's a possible with Cloud and that's something that has been compelling for a lot of organizations as well. Those are sort of the drivers and it's important to understand those because they sort of set the boundaries for what we as capacity management experts can actually do and what we should promote and what we should push for in order to not be seen as a hindrance, but actually someone who adds value to the business.
Per Bauer: 14:06 When we migrate to the Cloud, as this session is going to be about, there are two different options. One that is front loaded and another approach that is a bit more back loaded. What do I mean by that? Let's look at that.
Per Bauer: 14:23 The first one is before you migrate anything over to Cloud it's important to understand that Public Cloud services are primarily designed for "cloud native" workloads. What do we mean by that? Cloud native workloads are refactored into Micro Services so monolithic or multi-tier applications should be broken up into Micro Services where each specific function or capability of the application is run as an individual service or a Micro Service. Those Micro Services is typically hosted in containers and those containers are orchestrated by some orchestration software like [inaudible 00:15:08] or Mesos or Swarm, et cetera. That's the assumption of the Public Cloud, that you should have an application designed in this fashion.
Per Bauer: 15:22 That means you can scale-out so as you have a change in demand or request for your application you can scale up by adding more notes, you can scale down by retracting notes. Having this dynamic allocation of resources that goes hand in hand with the demand cause you to use less resources over time and pay less.
Per Bauer: 15:48 In order to do this of course your application needs to be stateless so they need to have some sort of persistent service where these Micro Services can act independently because the idea of adding and retracting notes from this, you have to be stateless. There can't be any state maintained in each of those different Micro Services because if a Micro Service goes away so does that session for that user and you would get all kind of violations towards business rules, et cetera.
Per Bauer: 16:21 You have to have a stateless application that is designed to work in that fashion and you also need to add a certain level of automation in order to make this work since you get many more components to orchestrate and since the scale up, scale down has to happen or scale out needs to happen automatically. You need to have hooks in each of those different components that allows you to do automation, either through the orchestration software or through third party software that takes care of it. The application needs to be designed for automation.
Per Bauer: 16:58 Ideally an application would go through all these different phases before you move them over to Cloud. The typical scenario for doing that would be to refactor it to "Cloud Native" in your Legacy platform or on prem in your existing environment. You test and verify the refactoring you did and make sure that it scales, that it performs, that it behaves the way you expected it to do, et cetera, et cetera. Then you deploy it in the Cloud and operate it from there. Simple, straightforward, this is how you should do it.
Per Bauer: 17:31 But in reality this is kind of hard to accomplish, which we'll come back to. In order to do this there is still some work to do from a capacity planning perspective. When you do this, you move them over you need to identify the optimal Capacity Unit for your application still because they offer this horizontal scalability. Their optimization effort focuses on identifying this optimal amount of capacity that will be added or retracted every time that there is a change in demand.
Per Bauer: 18:08 When defining your optimal Capacity Unit you need to consider different parameters like what are the pre-defined instances available from the top provider and what is the price/performance relationship for those. Are there any one that are better suited for what I'm looking for versus the rest of the offerings.
Per Bauer: 18:31 You also need to understand the characteristics of the application so what is the unit that describes the amount of activity that is going on and how does that relate to resource usage. How many transactions of type X can you fit into an instance of type Y and what does that amount to transaction represent in the business terms? You necessarily need to understand your application and what that application does in order to do proper planning.
Per Bauer: 19:00 Then you also of course need to understand the seasonality and variability of demand for your application. Is there a relatively steady sustained level of activity going on in my application all the time or is it subject to wild swings in demand? In the later case you'll probably need to have larger chunks of capacity that you add in order to scale up quick because there may be a very rapid demand for your service or rapid increase in demand. Whereas for a more well-behaved, static application you may scale in very small increments because those increments are going to be cheaper. You need to understand this still and you need to build this workload profile over time to understand how your application is behaving and what it does.
Per Bauer: 19:55 Even though you're refactored there's still some work to be done in order to identify your ideal Capacity Unit. But there is also other challenges with this refactoring approach. If you do this at the same time or if you do that and then you replatform and move things over you're in all essence introducing multiple changes at the same time or two changes at the same time which makes it really hard to determine root cause. If you do this refactoring your application and then you move it to a brand new hosting environment or new platform to run it on, if you have performance problems or if it doesn't behave the way you expected it to do it's hard to determine why that is. Introducing multiple changes at the same time is never a good idea, we all know that and that is exactly what you're doing here.
Per Bauer: 20:46 Another aspect is that it's hard to accurately mimic the Public Cloud environment before you make the move. Doing this refactoring in a live environment and the verification and full testing of everything is also very hard. It's a challenge to do that in itself.
Per Bauer: 21:05 The third one which is many times as simple, the fact that it takes longer time before you see any visible results. If you have a restless CIO or restless business users that want to see results, want to see you moving into the Cloud, see Cloud initiatives come to fruition, it may not be the wisest thing to spend a year preparing for it before you actually move anything. That impatience will also force you to sometimes move quicker and not be allowing you to do this refactoring.
Per Bauer: 21:45 The other alternative then that becomes reality for a lot of organizations is the lift and shift approach where you basically migrate your workloads the way they are. There may be certain workloads that are completely inappropriate for running under Cloud, those are probably better left running on the Legacy platform in your data center, but everything that could be fitted into Public Cloud environment is moved as is. Then once it's there you start to gradually refactor, test, and verify in that environment. Then you go through this continuous refinement.
Per Bauer: 22:22 This lift and shift approach is actually more common than you would imagine. I think in theory everyone should do refactoring of everything before they move it, but the reality is that a lot of organizations just lift and shift their workloads. I've seen numbers of as much as 80% that were workloads being moved over to Cloud as not being changed or refactored to fit into the environment at all actually.
Per Bauer: 22:52 In order to do this, in order to make the lift and shift operation as good as possible or avoid any major disasters, you still need to have some understanding of what it is you're moving. You need to identify the optimal target instance type for each of the workloads that you're moving. Typically those workloads would be running in a VM so it's basically moving a VM into an instance in the Cloud.
Per Bauer: 23:23 Things that you need to look into is, of course, business activity cycles. You need to make sure that you look at behavioral data or utilization data or performance data of your application that ranges a whole business activity cycle. If you have a peak at the end of each month you need to look at least one month of data, if you have quarterly peaks a quarter of data is what you need, et cetera. Taking a snapshot of data that describes a less busy period of the system could end up being catastrophic results. You need to provision for your business period and you need to understand what the business activity cycle looks like.
Per Bauer: 24:11 You need to look at the resource utilization of course. You can look at resource utilization in different ways. You can look at the average which is probably a bit more forgiving so it exposes you to higher risk. There may be one or two peaks in the period that is lost in an overall average. A lot of customers or a lot of people we've worked with have rather used a percentile as much as the 95th percentile to make sure that you capture most of the peaks on the specific events that could cause you problem.
Per Bauer: 24:53 You need to determine what is your threshold for using OnDemand versus Reserved instances. OnDemand resources are resources or instances where you pay as you go. Typically pay a slightly higher rate for those, but on the other hand you don't pay anything or don't use them. Versus Reserved instances is where you get a very unit price, but you are committing to a long term use. There is typically a breakeven point between the two of them where it makes sense to use the Reserved instance, especially if you know that your workload is going to be around and won't change for a considerable time. It may be a good idea to use Reserved instance. That's another parameter that you need to determine what is the break point for switching from OnDemand to Reserved.
Per Bauer: 25:47 Then as you make the decision it makes sense to also account for some short-term growth. If you know that there is a short-term growth or a specific business event or anything happening anytime soon that will increase the requirement for resources you better bank that in already now otherwise you may get migration and a few weeks or months later you have to migrate that again over to another type of instance. Understanding if there is any lurking events that you should take into consideration when you calculate your requirements is probably also a good idea.
Per Bauer: 26:32 How does this look like in real life? I've used an example from our solutions, our likely class management solution that takes input, VMs running under ESX and VMware and helps you to calculate the ideal target instances and yeah. It's basically you point it to a cluster of VMware hosts and it finds all the VMs and based on utilization metrics it comes up with recommendations for where to put this. There is a number of different parameters you can set, but I've collected the four most important ones here. It's a bit what we discussed in the previous slides so you look at their intervals or what is the amount of the time you need to look at to capture a full business cycle, what is the usage characterization should I use percentile or average, what is the threshold between Reservation and OnDemand resources, and then if there's any growth margin or additional head room that I should include based on either as a safety margin or based on forecasted increase in demand.
Per Bauer: 27:56 When you run this you will get this type of result where you get recommendations based on those parameters that we set. A straight translation of the resource congregation to a similar target alternative, hence the one called congregation. Basically you're looking at the characteristics of the VM and finding the best match for that Cloud provider. In this case AWS so it's between T and C instances of different sizes.
Per Bauer: 28:29 We have recommendations around percentile, we used the 95th percentile in this case so not just looking at the congregation, but how that actually was used. As you can tell here most of those congregations or the VMs that we were looking at had some empty space or some were a bit over proficient. If you look at the actual utilization they are way lower so then most of them fit into a nano-instance which is the most instances you can find. This is a lab environment so it's not very surprising results, but still there will always be less or a smaller or more lean recommendation in the percentile column than in the congregation column.
Per Bauer: 29:22 Then the last one being based on average. In this case there's very little difference between average and percentile. In some cases there will be differences of course.
Per Bauer: 29:31 Then we also have a column about whether we should go for a Reserved instance or not. We set a threshold for 20% I think it was and that number resulted in at least one of them recommending the Reserved instance rather than the OnDemand instance.
Per Bauer: 29:53 The gain of doing this, there's multiple. One thing is you can reduce the impact of manual mistakes because you have a tool that does these for you, finds the right data, right hour of activity or the right window of activity to base this recommendation on. It can also save you a lot of time, this could be hundreds of VMs if you have a large cluster of VMs it's kind of a monotonous and tedious task to go through all those instances and find the peak period and making sure that you use the right one and then do a translation over. This type of help can save you a lot of time in doing this and can also make sure that you're not making any mistakes that will hurt you later on. This is typically how you would do the transition over in a lift and shift, compare the existing instances that you're using and find the best translation in the right environment.
Per Bauer: 31:00 What are the consequences of a lift and shift? The biggest is really that it will cost you. TCO calculations that are offered by Cloud providers are typically built on the assumption that you have refactored the operations to Cloud Native. Running them as is or just doing a lift and shift will certainly increase the amount of resources that are being used and TCO calculation that was made is probably not going to be the same. In all essence doing this lift and shift is just the first step. You have to follow it up by doing some level of refactoring and optimization in order to get long-term efficiency. It's typically not sustainable to run on original workloads in the Cloud without doing any type of refactoring. Those are some of the consequences.
Per Bauer: 32:14 Those are the two migration strategies we talked about so what some of the general recommendations and considerations that you should use when you do this? The first one is of course you should try and categorize and rate your workloads. You need to look at them from multiple different aspects. If you have a workload to migrate it may be worthwhile to assess the feasibility of migration first. Rating them based on different parameters like the ones here will allow you to make a better decision.
Per Bauer: 32:47 The first one is architectural fit. Is the application pure monolithic application that requires, doesn't support any kind of scale out but at least scales up, it uses a fair amount of resources even in low activity situations? It's probably not a good fit for making a migration without refactoring. Whereas if it already has some level of service orientation or at least is divided into multiple different tiers, it may actually be a relatively good fit for moving over as is.
Per Bauer: 33:24 Another aspect to look at is the current cost of ownership. If the current cost of ownership is low, the benefits of moving over is probably not as good. Whereas if it's very expensive application using a lot of different resources, a lot of expensive resources it may be more interesting to move it over.
Per Bauer: 33:47 Another one being the business process that it supports. What is the nature of the business process, the application your supporting? Is it targeting a well-defined group of internal users or is it facing external users of unknown size? If it's the later it's probably a better fit for Public Cloud, especially if it includes the element of directors for interaction because then you can also get closer to your external engines by running on the Cloud rather than running on prem.
Per Bauer: 34:18 Then the last one with the [inaudible 00:34:20] is probably more is the demand for the service or the application. If there's steep changes in demand for your application that you need to manage it's probably a good idea to host it in a Cloud because then you can sort of pay for what you use where versus if it's a stable application running in a back office environment with a stable, sustained level of activity the benefits of moving it over may be less as well.
Per Bauer: 34:53 By systematically assessing your workloads based on these parameters you can make sure that you prioritize the right activities and start with the most relevant workloads and such and move those over first.
Per Bauer: 35:11 Another important aspect when you make the transition over or the migration over into Cloud is the cost management aspect of it. The whole capacity management space is changing in the Cloud environment from optimizing the utilization of a limited set of resources which used to be the case, it's more about delivering the required capacity at the lowest possible cost. Cost is at the center of what we do in the Cloud from a capacity management perspective. If you start off by migrating applications without putting the right instrumentation in place in order to do cost management it's going to be really difficult moving forward.
Per Bauer: 35:59 Since allocation and consumption of resources in the Cloud is scattered across the organization, there's self-service elements to that allows business users or business units or lines of businesses to allocate their own resources. You lose the overview completely.
Per Bauer: 36:22 There's really no incentive to curb cloud spending if you don't have ... if yes, use the standard billing you get one bill from a Cloud provider and it's paid for by central IT or it's divided in some simplified fashion across the different business lines, that's not going to ... there's no real incentive for your customers to be mindful about how they use resources. You need to have factoring of your costs so that you can show to each business unit what they're actually using and have them pay for it.
Per Bauer: 37:02 Then of course if you don't have an understanding of how much you're paying for your Cloud services or what amount of resources a specific service is using it's hard to do overall valuation. If you don't have the fully burdened costs of each service it's hard to compare that cost versus what the value of that service is. The whole service valuation thing becomes really, really difficult to do as well.
Per Bauer: 37:32 There's a bunch of different cost management challenges involved in moving over to the Cloud. The best way to do deal with this is to use tagging of resources. You always should use tagging of Cloud allocations, they should contain at least the business unit or service that is using it, the application, and the owner. You need to have a global structure and naming conventions that allows you to keep track of this. Then once you have this you track and report on consumption by "peer" for charge-back or at least show-back. If you're not ready to do full charge-back solution or culturally you're not in that place at least you need to show-back to the businesses what business units are using or what different services are consuming, what costs they are incurring in the Cloud. Otherwise you're never going to full organization around cost for Cloud environments.
Per Bauer: 38:30 Another thing that is really tempting to do when you move things over to Cloud is to go for an approach where you make your applications or workloads Cloud agnostic so trying to accomplish a state where you can move things across different Cloud services without changing the application. That will give you of course flexibility and more options, you can avoid lock-in, et cetera.
Per Bauer: 38:59 But in reality that's really difficult to do. If you do that you're sort of restricted to least common denominator scenario where you can only use those features and services that are available for all those different Cloud service providers that you want to use. Also in order to manage this and to make sure that you get a controllable situation you need to include certain components like Cloud Service Brokers or Cloud Management Platform in front of those multiple different Clouds and that in itself creates a dependency on those. The lock-in that you were trying to get away from pops up in a different level instead. Then your subject to one of those broker platforms and you've become dependent on those. More and more you see experts recommending that you take a very close look at Multi Cloud or Cloud agnostic approach it's because the cost of doing it is typically much higher than the benefit that you would get out of it. The fact that you can't use some of those services that an Amazon or an Azure provides you with means that you will have to conduct a lot of those primitive services yourself in your workloads, in your applications. The cost of doing that is many times higher than you would potentially save by having the independency of a specific provider. Even though it's a compelling approach it has to be looked at really hard before you determine on a Cloud agnostic approach.
Per Bauer: 40:54 With that those are some of the recommendations and some of the migration strategies, so what's next after you've done that? Now we have moved over to the Cloud, we understand the upshares and best practices around it so what is next when you're in the Cloud?
Per Bauer: 41:13 Even though you've made the migration, whether it was through refactoring first or through lift and shift, you continuously have to monitor and identify and address inefficiencies in your environment. You need to track charges versus budget to make sure that you stay on budget and you're not running astray and allocating more resources than you can afford. If you do you need to raise that and get a business decision around that situation. You need to forecast future capacity requirements as we've always done before. You need to still work with a demand management approach where you understand the future requirements and how those will impact your allocation of resources and whether your current strategy is actually the best one for the scenarios that you have ahead of you, et cetera, et cetera. Keep on planning and keep on trying to understand what is tomorrows requirements in terms of allocation and capacity. Then of course predicting the cost. As we said before, very much capacity management shifts from being an optimization discipline where you try to optimize the free set of resources it becomes a matter of getting the right capacity at the lowest possible cost. Being able to predict the costs by combining different sources of information and forecast what the bill at the end of the month or at the end of the quarter will look like is also something we need to do.
Per Bauer: 43:00 Those are the ongoing things that you need to do in a Cloud environment. We will have another webinar later with CNG, I think it's in May time frame where we'll spend more time on this ongoing management and the optimization of your Cloud resources. Once you've made the migration of your workload, what do you need to do and how do you do that in the best possible fashion.
Per Bauer: 43:25 Over to a summary. What we said here is that capacity management for the Cloud is very much about delivering the right capacity at the lowest possible cost so making sure that you understand what the right capacity is and making sure that you find the best way of sourcing that capacity whether type of instances you should use, how you scale up or scale out your application, and how you scale back your application in the event of lower activity.
Per Bauer: 44:02 The migration and such is down in two different ways, you have the front loaded approach of refactoring first so most of the effort happens at the beginning and then you migrate your application and leave it running in the Cloud. The other approach, the other alternative was the lift and shift approach where you do less before you migrate and then you take care and clean up the mess and optimize the resource usage once you're in the Cloud. Those two different approaches, the first one being best in theory but in reality a lot of companies will have to use the second alternative, the lift and shift.
Per Bauer: 44:46 But when you do that it's very important to understand the cost aspect of it. The cost aspect that has the same general shape, but it has other implications. The refactored approach, the alternative one, once you're in the Cloud for the continuous operation of this the cost is going to be considerably lower because you're going to have a more dynamic application that scales out and scales back as you need because of demand, but it's going to decommission resources and turn off instances as you see lower activity levels. Whereas the lift and shift approach where you keep your existing architecture, your existing application structure, you are typically stuck with a much higher cost and unless you actually do something about this the cost will continue to be fairly high as you move forward. It's important to understand that lift and shift approach is not really a sustainable state, it's something you have to live with maybe after the migration but as soon as possible you need to start to optimize and get away from that architecture because it's not ideal for the Cloud environment.
Per Bauer: 46:14 Workload refactoring as we said for sustainable long-term TCO, it's not an option not to do it really. Another thing to remember is you need to instrument for cost management from the start. If you don't do it when you do the migration, it's typically hard to fit that in afterwards. During the migration you should time all your resources with what business applications are involved, what if there's not enough business, what is it's cost center, et cetera, et cetera. Everything that, every kind of instrumentation that you need from a cost management or cost breakdown perspective is important to assert from inception. Then the last one as important something we'll discuss in more in-depth in the later session is ongoing optimization and planning post migration will further improve TCO. Even though you may have done a good job refactoring and translating your requirements from on prem over to Cloud there is always going to be changing circumstances, there is always going to be new offerings from the Cloud service provider, et cetera, et cetera that will force you to or that makes it worthwhile to do ongoing optimization of planning. How you do that in the best possible way we'll talk about in the upcoming webinar later on.
Per Bauer: 47:38 With that I don't have anything more so I leave the flor open for questions, we still have about 10 minutes to go.
Amanda Hendley: 47:52 Just a reminder that if you have any questions you can submit them in the questions bar located on the side in the panel and if we don't get to them we will follow up later on with a blog. Please just type your questions in too just in case. We did get a few questions in during your presentation one of them is, "Whyy is optimization so important in the Cloud? I thought that the elasticity of the Cloud was supposed to make things easier?"
Per Bauer: 48:26 Yeah. That's a very common question I guess. Why it's important it is really the ease of provisioning new resources because of self services for example that is offered in the Cloud. Very easily it's this [inaudible 00:48:44] that we used to see for servers or VMs in the past so the fact that a lot of the provisioning that happens in the Cloud of instances happens without "proper analysis" so there's much more potential for optimization in those kind of environments because you've allocated more resources than required.
Per Bauer: 49:14 Another aspect that makes optimization more important as well is the fact that there is no ... since Cloud is OpEx there's no sunk costs so any kind of optimization effort that you make will pay off immediately. At the end of the month your bill will be slightly less than it would be otherwise if you didn't make those optimizations whereas in the past we made capital investments in infrastructure and regardless if we optimized or not we still had the amortization or incurred cost of that infrastructure. Whereas in this case we pay for what we use and the incentive for using as little as possible is immediate, gives immediate payback.
Amanda Hendley: 50:04 Perfect. Thank you for that. Let's see, another question. Sorting through them. How could I assure the application is doing alright after the migration?
Per Bauer: 50:16 Yeah, that's a good point. As we saw here moving an application over is ... you do sort of a high level assessment of requirements of the application in order to move it over, but once you're there you need to make sure that the application is delivering the kind of performance or is doing as well as you need it to. You can either take the sort of simplified approach of monitoring utilization of resources, mind though that the resources aren't physical and it's not dedicated resources, but it's some sort of proxy resources that you have for your environment that you can still look at those KPIs and those metrics and see whether your application seems to do okay.
Per Bauer: 51:01 I would recommend everyone to take a serious look at some more in depth kind of analysis like queuing models for example where you look at the wait times or queuing for resources and by that determine whether your application is actually doing okay. That's probably a better option.
Per Bauer: 51:29 Of course you can also monitor the response times, system to user or system to sub-system of the workload where that is applicable to determine how your application is doing. But it's very important to have a framework in place once you make the migration to monitor the performance and the response times of the applications.
Amanda Hendley: 51:54 A follow up question to that one, what are the four key items being monitored after migration?
Per Bauer: 52:01 The four key items? I would say that if response time is a relevant metric. If it's an application component that is user interfacing or interfacing with some sort of public API or [inaudible 00:52:25] that response time is of course something you always need to look at.
Per Bauer: 52:30 I would also argue that basic resource consumption like virtual CPU, it's not a physical CPU, it's not even your CPU, it's the CPU that you share with a number of other tenants. That's another important metric to monitor.
Per Bauer: 52:52 But from a cost perspective it's also important to monitor something like on AWS for example there's CPU credits. If you have allocated a certain amount of resources to your instance and you're not using that those show up as credits, unused resource credits that you can use for later. You may be able to use them later but if you're not those are lost so that in itself is a good indicator if you have over provisioned or if you have allocated a type of instance with the amount of resources that aren't really being used by you.
Per Bauer: 53:33 Those are the type of metrics I would look at.
Amanda Hendley: 53:40 Perfect. Well we've got time for about one more question. Let me see. You mentioned cost prediction as one of the major tasks once you have migrated, how exactly do you do that?
Per Bauer: 53:58 In order to do that you need a number of different components. I mean you need to understand how much you're being charged for what you're using right now and what you have been using in the past. There's typically APIs available from the different Cloud providers that allows you to look at those numbers. Historical numbers aren't really difficult, but you can also get current numbers accumulatively and the current periods there's months how much charges have I cost by the way I'm doing things. If you combine that with some kind of historical seasonality patterns that gives you a good understanding of seasonal changes and organic growth patterns over time, et cetera. Then also some level of forecasts for the business about business activity, new future business activity, or disruptive events, et cetera.
Per Bauer: 55:02 According to the old traditional demand management practices, those three in combination. Resource utilization, historical seasonality patterns, and forecast about future business activity. If you combine those three it's fairly easy to do a prediction of basically an extrapolation of the charges this far and with some adjustments based on forecasts and historical seasonality behaviors you can forecast the cost that's going to happen at the end of the month. There is a formula that we are using to do these things and in our budget class management that is basically you using those three components in combination.
Amanda Hendley: 55:54 Okay, perfect. Well thank you so much for all your insights today. Attendees if you do come up with any follow up questions please feel free to submit them through our website CNG.org and we will add those to the questions which were submitted for a follow up blog. Please check back in for future webinars that are being hosted along with we have a follow up with TeamQuest in May. Please just check our calendar and thank you all for attending today. Hope you have a wonderful rest of your week and Per, thank you again.
Try Vityl Capacity Management free for 30 days to see how it can keep your cloud migration in check.