You’ve decided to migrate to the cloud, but what migration strategy will keep your costs down and be most efficient?
You have two main strategies for moving workloads to the cloud: Refactoring or Lift and Shift. Refactoring requires more time on the front end to optimize workloads prior to migrating. Lift and Shift moves eligible workloads right away, optimizing later.
Watch this webinar to learn how to confidently move forward with your migration:
- Understand the pros and cons of Refactoring and Lift and Shift
- Determine which strategy will work best for your environment
- See how to easily analyze your existing infrastructure to optimize your cloud configuration
- Learn how effort and cost can be significantly reduced as you migrate
Stacy Doughan: 00:00 Hello everyone. My name is Stacy Doughan and I'd like to welcome you to today's webinar: Getting Started with Your Cloud Migration. In today's webinar, you'll learn the pros and cons of refactoring and lift and shift, which strategy will work best for your environment, how to easily analyze your existing infrastructure to optimize your cloud configuration, and how effort and costs can be significantly reduced as you migrate. If you have questions throughout today's presentation, please submit them via the chat box and we'll do our best to answer them before the end of today's session.
Before we get started, I'd like to introduce today's presenter, Keith Digby. Keith Digby is a senior professional services specialist with 20+ years in capacity management. He's been with TeamQuest for 6+ years, deploying TeamQuest solutions and delivering training with many of the top 25 global banks and telecom companies as well as other organizations. Previous to joining TeamQuest, Keith was employed as a capacity management consultant with a top UK bank for five years. Keith is also an IT optimization expert, delivering workshops and conducting certifications in ITSL. Hey, Keith. Thanks for joining us today.
Keith Digby: 01:13 Hey, Stacy. Thanks for having me.
Stacy Doughan: 01:15 Yeah. All right, we can go ahead and get started.
Keith Digby: 01:19 Excellent. Okay. So as Stacy's just explained, we're going to talk about getting started with migrating your infrastructure into the cloud. There's five points that we want to talk about here. Firstly, cloud facts. The strategies that we might want to take. And then we're going to show some assistance that we may be able to provide you with migrating into the cloud and some ways that we might be able to save you the cost of doing so, as well. Followed by the summary, of course.
So before we start, I'd like to talk about some common terminology. So where we have on-premises or traditional IT shops, they've been building their own private clouds, generally around VMware or some other virtualized environment. So that's like your traditional IT environment plus a private cloud. And people are now beginning to migrate also into the public cloud.
Now during those phases, they may be both in the private cloud and in the public cloud, so this is a hybrid cloud solution. And more recently people have been having multiple cloud environments that they may wish to host their services in. But you have a multicloud, that's different providers providing your cloud instances for you. Okay?
So this is the view that we are having here with cloud. So everything really is hybrid IT. Okay? So you can have a hybrid IT environment that encompasses everything from their large monolithic database services they're running themselves all the way through to cloud provider services by public cloud providers.
The reason that this is becoming a big subject, Gartner's done some research and they're seeing that over a period of about five years, they're more than doubling the amount of cloud provider services that are being provided for IT customers to do this type of hosting in the cloud. And the area that we are interested in here has a very large amount of growth. This is infrastructure as a service. Okay? So this is very specifically the area of cloud provisioning that we're talking about.
So over the years, they've gone from having no cloud and now, around 2018, we're finding that organizations are not necessarily migrating existing infrastructure to the cloud, but when new applications come along they'll be engineered to go in the cloud, enabled for new applications. And then in a couple of years' time, we've seen that a lot of organizations will be only running in the cloud. Okay? So this is very important that we get this costing right.
So why do organizations specifically adopt infrastructure as a service? Well, the biggest one that we find is that they want to reduce their costs. But this obviously has to be handled very carefully because you have less control over your costs if you're buying in a service than perhaps if you're providing that infrastructure yourself.
They also see there's more, gives them more agility so that they have the ability to generate instances to run their applications more quickly when they don't have to deploy the infrastructure themselves. It's somebody else's problem to provide that hardware and you can just spin up instances as you see fit.
It's also become a very large strategic focus. CEOs, organizations are seeing ... There's been these benefits of assurance, scalability, as you've seen, cost reduction. So they're seeing that as a strategic benefit for their organization. And it's basic. Everybody's doing it, so you don't want to get left behind, either, from a strategic point of view.
The assurance, I mean, you have high availability built into these solutions. You have the ability to use these instances almost on demand. You don't have to wait a certain number of weeks for the hardware to be configured and made available as a server for you to run. And that gives you increased scalability. So as your business grows, then you can just order more of these cloud instances to support that business.
But there are obviously dangers and pitfalls at every one of these points. So a little bit more research has been done here. So the State of the Cloud Report for this year, which I believe was published about February time, that's shown at the top. I think there's eight or nine different initiatives in place there, and the top two are to do with existing clouds, and so optimizing the cloud use you have and finding cost savings, but also moving workloads into the cloud. Now the implication by that statement is that these are existing workloads. Well, they're new workloads, and that's an important point to consider for later on.
They've also found that 30% of ... Well, when asked how much cloud spend is wasted, organizations in the cloud, they've said that they think that they're wasting around about 30% of their spend, which is a sizable chunk, but they also think there's an additional 5% that's not been accounted for. Okay? Now I don't know by what measures they're saying that the spend is wasted, but we'll have some points later on as to perhaps how they can identify this waste and possibly find more savings as well.
A separate article published by SiliconANGLE last year, late last year, found that 84% of operating systems migrated into the cloud. This is existing workloads from, perhaps, taking Windows, Linux servers out of a VMware environment and just like-for-like mapping those into cloud instances. 84% of those were overallocated either on processing power or memory than the work that is actually required. Okay? So clearly there's some savings that can be made by purchasing or rightsizing the instances to match what your workloads require rather than what the VMs are configured.
So the final point there is they believe that if cloud resources were adjusted to match the needs, the costs would fall by 35%. This kind of matches what's being seen up here at this level with a 30, 35% waste seen by State of the Cloud Report.
So what strategies do we have in place at the moment to help us with migrating workload strategies and workloads into the cloud? There's really two that are commonly seen at the moment. Public cloud services are designed to have cloud-native work, as they work best when the application has been designed to run in the cloud. But if we're talking about migrating existing workflows, there's a number of things you have to take into consideration.
So what we mean by cloud-native workload would be that it's designed to use microservices, so very small pieces of code that exist independently of each other. They have the ability to scale out, to give you the capacity that you require on demand. So if you have a peak period of processing, you can spin up more instances of your cloud environment to cover that workload.
And the important thing there is not only adding the infrastructure, but also retracting it and claiming that back when you're not using it. Otherwise you're paying for resources, clearly, that you're not using. Part of the cloud-native way of working means that these individual microservices also have to be stateless, so there is no persistence of the data between the times that these services are running, these microservices.
So there has to be some orchestration drawn, generally automated, to bring all of this in place so the data is passed back to the central place, the request of the data, and then these little microservices can disappear and go and do work for somebody else. That's kind of the point of all of this. So there has to be a lot of automation in place and that automation has to be scaled into the application that you're running in the cloud. So it has to be very tightly linked to these microservices scaling up when required, a retraction when not required. All of this works very well together.
If you have an existing workload that you wish to take and put it into the cloud, ideally we can see that if it's going to work best as cloud native, you want to refactor this existing workload so that it runs as a cloud-native application. And to be honest, that's a lot of work. Okay? So first of all, on your legacy platform, you would do the work to make it cloud native. You'd then have to test on your legacy platform that this refactoring has worked and the application is performing as you'd expect it to. Once all that work is done, you can then deploy to the cloud and operate. Okay?
There's various factors that cause us problems when we have this approach, though. So if you're replatforming and refactoring at the same time, you're doing a lot of changes to your code, to your application structure, to the way the thing works, at the same time. So if something goes wrong, it's very difficult to find out and go back what the real cause was. It's also very hard to mimic a public cloud environment in your own in-house environment for your refactoring, verification and testing phase.
So yeah, all that testing that you might do may well not actually be going to work the same when you get to the cloud anyway. And of course there's a lot of work involved in all this, I'm going to go back a screen, all this level of one and two before we can even get to deploy into the cloud. So it's going to be a long time before we get any visible results, which then kind of affects that agility and strategic calls that we saw.
So what organizations do tend to do is they look at this in terms of, well, let's lift and shift our workload. So we don't do anything on our legacy platform at all. And we take a one-for-one mapping of a VM to a cloud instance and run that workload as if it was just running in-house, but actually just run it in the cloud. So we're migrating it as is. And then they may well choose to refactor, test, verify and do continuous refinements against that workload running in the cloud. Okay?
So we're talking now about migration assistance. So we think that as capacity managers, and as a capacity management organization at TeamQuest, that we need to actually speak to capacity management to do this. We need to rightsize our environment to the cloud, and we call this lift, trim and shift. Other names are available, but it's the one we've chosen for this presentation.
So what we want to do is identify the optimal target instance type for each VM, and we're going to do this based on standard capacity management measures. So we need to make sure that the business activity cycle is covered. So if you have a peak workload quarterly or even yearly, you need to make sure that the sample size you use takes that peak workload into account.
We're also going to assess optimal target size based on two criteria. Now the average is probably going to give you lower optimal target types to use, because the values are averaged. And this is a higher risk, but you would get higher cost saving back, as well. And maybe there are some servers that that's a suitable criteria to use. However, we think that using a percentile, perhaps 95th percentile, is a better way of doing it because you'll get rid of the outliers but still see the peaks reflected in that workload measurement.
There's also a discussion that needs to be had around whether an instance type that is chosen is an on-demand instance or a reserved instance. Now, depending on the deal that you have with your cloud provider, you will get a discount for a reserved instance, but you pay for that all the time. And depending, again, on the deal you have with your cloud provider, the threshold that's in place as to when it's better to have an on-demand or a reserved instance is generally based on CPU memory utilization. But the threshold can depend depending on what discount you have from your cloud provider. So that's something that we can make guidance on, but the input would have to come from a particular organization or cloud provider depending on the deal they have.
And we also don't want to be resizing to an optimal target instance type based on workload from the business as it is now. There's bound to be some short-term, medium-term growth that we should account for, as well. So we need to account perhaps 10% per annum or whatever the value that you would normally put, perhaps, into your capacity plan would be, to make sure that we have the rightsize environment now and also into the future.
So in summary for this section, when you're capacity managing for the cloud, the main reason that you want to do this is to get the lowest possible cost, but you still need to be delivering the right amount of capacity in the same way as you would with any other capacity management exercise. So when we are migrating, we have two choices. We can refactor first. That's alternative one. Or traditionally, alternative two would be just take our workloads, shift them directly into the cloud as a one-for-one mapping from your VMware configuration.
Now the difference between these two techniques are as follows. With alternative one, the amount of cost and effort is much greater up front before you can even get to migrating these workloads into the cloud environment. However, once you've done that, the cost ongoing of running these environments, these applications in the cloud, will be significantly reduced. With lift and shift, taking it based on configuration as we generally see at the moment, obviously much less effort and cost at the front, but the costs will be massive compared to the same as you could do by refactoring. Okay? So this is the wide, orange path we see here. And that's ongoing until any refactoring optimization takes place to reduce that, but it does give you much quicker results.
Now the third option that we are proposing here, lift trim and shift, similar amounts of effort, in fact probably a little more effort in front of getting this thing working, but the costs will be significantly reduced and remain significantly reduced from the outset once the migration has taken place.
So how do we assess this? Well, we analyzed historical VMware data, and based on those criteria we looked at just now, we've taken into account for the top three that we've got a long enough period of time that we are identifying periods of business, so Monday to Friday during business hours. We can change these to being seven days a week, 24 hours if we wish. And then we select some percentile that we would like to apply and I've chosen here is 95 percentile, so we're getting rid of the few outliers in the data that could skew it, but we're making sure that we have the correct amount of resource being allocated to support that business workload that we see during the interim chosen.
I'm then given the option here to say how much growth would I want to apply. So this would be able to factor up any measured data by 10% and then recommends an instance size based on that factored 10% growth. And then the bottom three here are the thresholds for defining whether you'd want to have a reserved instance or not. And these are the actual criteria that the case we've used, Amazon Ec2, this is the criteria that they use for reserving thresholds, for instance, this.
So when we do this, we do our analysis and we're given a table. And the first thing you'll notice is for each individual VM, we have the recommendations in these three columns. So the recommendation based on whether you're just lifting it as the exact configuration you have at the moment, is very generally very much larger than the recommendation based on percentile on average. And we can see from this data, the smallest service that we have, t2.mediums, there's three of those. And when we looked at percentile, there's only one server that's actually a medium size from the recommendation. The rest are all, actually, t2.micros and t2.nanos. We then identify, based on these thresholds that we have in place, whether an instance should be configured as a reserved instance or not, which would, again, give us further reduction in cost.
So we've been talking a lot about the cost. This is the main thing. And remember that second point was talking about moving existing workloads. And the first point on the State of the Cloud Report was reducing costs in your existing cloud environment. Well, what we're proposing here is that you can reduce costs during the migration as well. So to compare the costs, this is actually taken from the Amazon Ec2 website, which lists all the different costs out, the different operating system sites. So to compare the top costs, I've just kept it consistent as a Red Hat Enterprise Linux instance based in, these guys, in north Virginia. And these are the costs that we have for our smaller instances. And you can see, generally, they're around ... The very small ones are less than 10 cents per hour, so the bottom figure around 10 cents per hour.
The bottom three that you see here, I've just gone and got the most expensive instances and put it at the bottom, and now we're up to $3.20 an hour. So it's considerably more if you're doing a lift and shift and it's configured as a very large machine than it would be if that machine had no utilization at all on it, then you could definitely get away with one these much smaller instances at a fraction of the cost. Okay?
So this is what we're applying here in our costs comparison. So the report that I showed earlier with a big table, that has a summary section at the top, and this is a clip from the configuration part of that summary. So this tells us how many of each instance types we would need based on configuration and how many of those instances are going to be reserved instances.
So from there, we can see that we would expect to see 17 c4.large instances. Eight t2.large are generally then defined as large instances. However, if we looked at the table when it's done with the criteria of percentile ... I do apologize for that little cursor there. That shouldn't be there. We can see that there's actually only one large instance. So we've gone from everything being large to only one needing to be large. And again, one reserved instance there.
In this report we also actually give all the evaluation criteria because we won't know what the inputs were put down. So this is based on exactly the same inputs that I showed earlier, so 95th percentile, we use a 10% allowance and there's some reserve thresholds at the bottom of the screen. So all this is then presented actually for you to make that decision based on are you happy with [inaudible 00:21:30] configuration? Are you confident going forward, based on the measured data, to use a much smaller instance type which would save you the money?
So if we talk about the lift and shift configuration, which we're referring here as untrimmed, so the cost here of those 29 instances based on the configuration, these are the instance costs from the Ec2 cost of configurator website using the Linux, as I said. So the total cost here is approaching $5 per hour for this small cluster, 29 VMs. If we look on the right hand side, if we use 95th percentile so we're still accounting for peaks in the utilization of the service, we can get away with 61 cents, which is over eight times cheaper than just doing a straight lift and shift based on configuration. Now I know people are probably sitting out there saying "Yes," but that's based on a small TeamQuest environment.
So I'm lucky enough to have access to a number of organizations through my work at TeamQuest. So I thought I'd do exactly the same comparison for a global top-25 telecom company that I've been doing some work with. And these were the results. So again, it's only a small cluster. It's actually a production cluster. It's not a development one or something, sandbox environment, whatever. So 18 servers in this particular cluster, based on configurations, $18. And based on 95th percentile, a little over $3. So this is 1/6 of the price by just doing this reasonably small amount of work to consider what your migration targets should be based on utilization. Okay? Which is significantly more than the 1/3 saving that we saw from the investigations that were done by the cloud and that other website that I created that I can't remember what it was called now. I do apologize, to them, anyway.
So in summary, if we do lift and shift, we are going to have a high level of cost ongoing. For a little more effort, we can consider this based on the right amount of capacity going forward to support your business, and we can significantly reduce the ongoing costs when running in the cloud, and I put it here 35 to 85%. It's going to depend very much on what your organization has in place. If they already have very well-sized VMs, then perhaps, well you'll know that you're having some savings, but you can be confident that those savings are what you have in place.
But I find that a lot of organizations I've worked with, in their VMware environments currently, they tend to oversize things, which cuts them in the short-term, I guess, in their own environments, but will cost them in the long-term unless they consider this going forward when they migrate to the cloud. And I think it is more when they migrate than if, based on the strategic decisions we see within companies at the moment, as well. So just by considering rightsizing this, you can then get the cost reduced significantly.
So ideally workload refactoring should be done to make sure that the cloud is cost effective for organizations. However, that is a lot of work and people want to see results quickly. They have a cloud migration strategy, they don't expect you to do six to nine months or even more work before you start migrating things.
So new applications going into the cloud that have never run on an in-house VMware environment or private cloud, yes, you would build your application to be cloud aware. But when we're migrating existing workloads to the cloud, possibly that wouldn't be an option for you. So the other option you have is lifting and shifting, but it does cost, definitely, even more. But we can trim your target instance environment and we can get you between three and six times greater cost savings, and that's ongoing savings as well. You never have to have that higher cost and try and reclaim things once they're in the cloud. You would go to a rightsized cloud environment automatically. Okay?
So I just have one more slide now. So if anybody is thinking of migrating into the cloud and have similar types of situations of what we see here, we can provide services for you that would help you to understand and clarify your cloud migration objectives. Because there's workload types, operating system types, large databases for instance, that aren't really particularly suited for a straight cloud migration. So we would review your environment with you, perform a full analysis, and then we would deliver these reports showing you what the results would be based on actions of clear recommendations on what instance type each server we'd get, and it would include the priorities and also the estimated cost and savings. Okay? So if you want more information about that, there's an email address at the bottom there for more details for you. Okay?
So all that remains for me now is to say thank you very much for your time this afternoon, and do you have any questions?
Stacy Doughan: 27:17 Thank you. Keith. We do actually have a couple of questions that have come in. If you have a question and haven't had an opportunity to submit it, go ahead and get that in at this time. The first question I have is, can you share the Surveyor view that you used in the presentation to show the AWS hardware sizing? And this comes from a current customer.
Keith Digby: 27:36 I think the sharing of the view would come along with ... before that we'd have to have a bigger discussion about how that would be shared, whether we have access to the right data, et cetera-
Stacy Doughan: 27:49 Sure.
Keith Digby: 27:49 ... and I think you'd have to go through a salesman to get that discussion started, I'm afraid.
Stacy Doughan: 27:54 Yeah. I think that having a reach out to your RSM can get the ball rolling as to what you need to get that view into your hands. Next question is, do we need to be using TeamQuest data or TeamQuest software already to take advantage of this analysis?
Keith Digby: 28:11 The only TeamQuest software you need for this analysis would be that you have to have Surveyor installed. You don't need to have Surveyor VMware data to run the analysis. It will also run against VMware vCenter database and give exactly the same results. I've tested it against both extensively. So if this is for existing TeamQuest customers, they would have to have Surveyor installed. But we can also make this available to non-TeamQuest customers provided they have a VMware vCenter database, we can provide a service that would provide this result for them.
Stacy Doughan: 28:46 Okay. So VMware vCenter database makes this possible to do this type of analysis. All right, great. Next question that I have here is, what cloud environments do you support?
Keith Digby: 29:04 We currently support Amazon Ec2. We are very shortly going to support Azure. And we're investigating using Google Cloud platform as well.
Stacy Doughan: 29:12 Okay, great. All right. Is the 10% growth scenario used in the presentation based on a linear trend or something else?
Keith Digby: 29:25 10% is a lump that's added to the top. There is no linear trending or anything like that. To derive what that percentage should be, you would probably like to ideally go to your business demand management team or whoever you talked to to get your business growth figures for the capacity plan because they're effectively the same numbers that we would apply. You could also derive that from a linear trend of growth to your VMware utilization in the past. So if you've grown 10% in the previous year, then you could apply 10% to know you're covered for the next year.
And then, also, we have the ability to monitor our cloud environment utilization ongoing anyway, so we can do exactly the same capacity management we do now with servers and VMware environments, we can do that in the cloud as well to make sure that 10% was accurate.
Stacy Doughan: 30:19 Sure. And then the next question is, can we use our Vityl TeamQuest tool to do this exercise on its own? I'm going to answer that. I think it's similar to the first question I asked. I'd just go ahead and reach out to your RSM. If you're not sure who that is, go ahead and email firstname.lastname@example.org and we'll make sure to get you in touch and get you the information you need to get your hands on the module.
And I'm not seeing any other questions come in, so I'm going to go ahead and close up our question and answer session. Before we close, I'd like to go ahead and invite you to next month's webinar. We're going to be discussing how to simplify capacity management. I did put the registration link right there in your chat window, so if you want to get registered now, go ahead and do that. It's on April 19th. And then I'd like to thank you for attending today and, Keith, thank you for presenting this information. As a reminder, the event recording will be emailed out and made available on our website in the coming days. So with that, thank you very much and have a great day.
Meet your cloud migration goals with expert help from HelpSystems.