Preventing Network Outages with Proactive Network Monitoring

On-Demand Webinar

Preventing Network Outages with Proactive Network Monitoring

 

The financial ramifications of technological failure for a business can be devastating. If your company's infrastructure fails you can potentially lose business, customers, and even your reputation. Proactively monitoring your network can ensure this doesn't happen.

In this webinar, learn how implementing a real-time network monitoring solution allows you to:

  • Identify the physical location of problematic devices before they affect the entire network
  • Troubleshoot bottlenecks or bandwidth issues so they don't become major problems
  • Observe traffic patterns to analyze trends and help with capacity planning

 

As IT professionals, our job is to ensure that we have the right technology in place, and we're responsible for the way this technology is working. We want the technology to work the way we intend it to. Sometimes network outages are inevitable, but what if we can preempt it? What if we can have something in place that can tell us before things become an issue? Monitoring your devices for certain thresholds can help you provide you with the time you need to mitigate potential risks or performance issues in your network. So again, the premise is you want to have some understanding, some visualization, some inclination of how your network is behaving and performing.

If there are issues on the horizon, you can take the necessary steps to preempt it and mitigate those potential risks. So our agenda today is to go through a couple of slides. We'll talk about some issues that cause network outages and how we can mitigate some of those issues. We'll take a look at a couple of slides where network outages have impacted some large brands, brands that we're familiar with, brands that we've seen in the industry. So again, it really doesn't matter the size, scope, and scale of the environment—outages can happen wherever and whenever.

We'll take a look at what we believe are some really good monitoring attributes that could be effective in terms of preventing outages. We can talk about how network monitoring software can improve your network efficiency and performance, and then we'll take some questions and have a little Q and A, so we can keep this a little interactive. In terms of understanding what causes network outages, let’s just say pretty much in every survey taken, the larger percentage of what causes a network outage will always be some kind of human error, albeit unplugging the cable, patching in the wrong port, shutting down the wrong device or server…I mean, the list can go on.

I'm sure many of you out there have experienced this type of scenario. I myself have probably done all these things at some point in my career. So again, it's very difficult to take that human intervention out of network management. You want to try to alleviate as much as possible that human error factor. So we want to make sure we see if these changes that we're making have a positive or a negative effect and then take that in consideration in terms of ongoing management and maintenance. The next major cause of an outage is environmental factors.

Sometimes these are things that we can't necessarily control. Again, power outages. There's shutdowns, building shutdowns, things that are always inevitable and we have to plan and manage our environment around those things. So again, you want to make sure that you have situations in place and some backup plans in place so when these things do happen, your hardware is not damaged, you're protecting your investment. The configuration issues goes together with the human error. Again, we're tasked with maintaining these devices, configuring these devices the way they're supposed to, the way we want them to, the way we want them to work.

Sometimes, the device is not configured properly and this can provide you with some issues. This can cause some network performance issues. This can cause some potential bottlenecks, some collisions. So there's a lot of things in terms of configuration that can actually cause outages. You want a solution that can validate your configuration, not necessarily tell you when things are configured improperly. Well, you do want that as well, but if you can do a configuration that you feel is a correct configuration but in essence it's not, the device will tell you and the device will behave in that way as well. So you want a product that can actually give you some validation on whether your device is configured properly.

Then, also, lack of visibility into your network. If you don't know, if you can't see, then obviously you don't have that understanding in terms of what's going on or what needs to be addressed. So again, this prevents you from determining issues before they cause outages. Visibility, I think, is key in terms of just having an overall understanding of your environment and then seeing what's going on. This can actually help you with the human error and the environmental factors. So everything is intertwined, but these are the four key things that can actually cause not just a small but a large widespread network outage.

So the next couple of slides are some examples of some larger brands that have seen some outages, brands that we're familiar with. These brands obviously have experienced a huge financial impact to their environment. The first brand that had a failure is Amazon. So Amazon is known for their cloud computing services. A few years ago, they had a network configuration issue where they brought down one of their data stores. So the data store that some of their clients are hosting their information on was compromised. Now, what's the financial impact of this? It’s huge. Think about the amount of money a 10-day credit to all their affected users was. That's a lot. A lot of their users use their cloud computing platform and a 10-day credit amounts to a lot of money lost.

So again, having a solution in place that can validate configuration changes and show that effect can go a long way in terms of trying to alleviate this particular loss in revenue. Next: Blackberry. Blackberry had a server outage as well. One of their core switches failed and millions of users were affected worldwide for 24 hours, 24 hours without service. So again, this was at a time where Blackberry was competing with a lot of other mobile devices. I remember this outage, and it was one of those things where this put them behind a little bit. Users were frustrated with Blackberry service. They were even more frustrated after this failure, after this outage.

So this is not just the result of a lost revenue, but this is also a result of lost customers moving away from Blackberry to other mobile technologies. Again, this had a huge impact on their business, having this switch failure. Then, Navitaire. Navitaire is a subsidiary of the Accenture Company. And what they do is they do booking for airlines. They are a Minnesota-based company. A few years ago, their online booking system went down. They suffered a hardware failure when they were working with Virgin Blue.

Virgin Blue does the travel; Navitaire does the reservations for Virgin Blue. So that cost them $20 million in compensation to Virgin Blue for the outage. So for a small company who's looking to grow and gain some momentum in the space, to shell out $20 million because of a hardware failure, you can do the math there, that's unacceptable. So again, you want to have something in place that can provide you with some ability to see these things, predict these failures before they happen that can cost you $20 million.

Bloomberg. We all know Bloomberg, great company. Just recently, Bloomberg had a terminal outage. Their cause was a combination of software and hardware failures in their network. Bloomberg is pretty tight-lipped, so they want to keep that under wraps, but it was failure nonetheless. Again, the terminals that these users pay about $2,000 for were unfunctionable. So these traders who are trying to make trades, up to second trades, had to pretty much pick up the phone and make their trades.

So it's not the end of the world, but for 325,000 people who are depending on Bloomberg terminals to basically give them that real time information in terms of analysis on their financial portfolios, the time that it takes to pick up the phone and make a phone call and hope that your brokers are available, that's time that's wasted, that's time that you're losing making a very important trade that can cost you millions and millions of dollars. So again, these are some very big brands where these outages, these failures, had a really big impact on their business.

So again, trying to avoid this by having something in place is critical. These guys, for whatever reason, did not have a situation where they could see these issues before they happened. They needed to have some fallback or backup plan available so this did not become a widespread problem or a major cost to the infrastructure. So again, it’s very important that you have a way of monitoring and managing all your network resources. So what network monitoring attributes are effective in preventing outages? Device discovery and performance monitoring.

Again, the ability to discover devices on your network, going out there and doing that auto-discovery that scans, that tells you exactly what's running on your network, having an understanding of all the devices whether good or bad, how they're functioning and how they're behaving. Then monitoring their performance, monitoring the traffic on your core switches, monitoring the traffic on your routers, making sure that your bandwidth utilization is at a point where it's not being overtaxed, seeing who's using the bandwidth.

These are things that are really good in preventing outages, because you have a good understanding in terms of how that translates to your day-to-day operations. Display and visualize status information of your devices and behaviors. What I mean is, again, one of the issues that I mentioned in terms of preventing outages is a lack of visibility. So you want a product that can provide you with that visibility, show you visually how your devices are behaving, show you the status of those devices so you can see that information in real time, so you know exactly what's going on. Then exception-based alerting and notification to keep you informed.

When your devices are in some alarm state, not necessarily critical, you want a gradual progression in terms of how your devices are behaving. So you want to see that that device is going from good, okay, not so okay, and then bad. So you can see that gradual progression of how your devices are behaving. You want to be alerted each step of the way so you know if there's some kind of imminent failure coming and you want to take those necessary steps to mitigate that potential risk.

So again, a strong alerting mechanism allows you to alert on potential issues, different thresholds that you feel are important to your business. So again, these three are really key features in terms of preventing network outages and all monitoring solutions that you're looking to vet out or you're using should have these attributes as part of them. So how network monitoring software can improve the network efficiency and performance is by providing the visibility and the capability to prevent costly issues. Understanding and seeing how your network is laid out and how your network is working goes a long way in terms of keeping that network running smoothly.

Real-time monitoring. There's a lot of very capable software out there. They don't necessarily boast real-time monitoring. There's a difference between capturing data every 10 minutes, every hour, every five hours, once a day. That's not real time. You want to make sure that you're always polling your devices, always getting some feedback in terms of what's happening with that device. Because again, what happens if you're polling once a day and your device misbehaves early in the day and your polling is set for later in the day? Then you don't know that the device is not behaving the way it's supposed to until later in the day.

So you pretty much lost almost a whole day before you can actually attend to the issue. If you have the real-time monitoring capability where your solution is constantly polling the device and you're getting feedback from the device during the course of the day, telling you what's going on, then you can take steps to see how the device is behaving and make sure that you have the resources in place to take care of an issue if it becomes an issue. So you want to leverage the right technology. Again, live map, visualization of your network, something that's tangible, something that you can see, something that lets you see the behavior of these devices.

Flow-based technology. Again, being able to capture information at the edge of your network, seeing how your bandwidth is being utilized, figuring out who the top list is on your network, what ports are being used, and then providing you with this graphical interface in terms of laying out the different hosts and the different users, how much data is being used, where the data is being used, and what time during the day that data is being used. So again, you get a really nice granular view of how your data is being distributed which is a nice way to possibly do some capacity planning.

If your data is being over-utilized, then maybe it's time to take a look and see if you require some additional bandwidth. Or if the bandwidth is not justified, then you can take necessary steps to put solutions in place or policies in place that can curtail some of the usage in your current infrastructure. So again, a nice way to have a good understanding of how your bandwidth is being utilized. Layer-2 mapping, the ability to show the interconnections of your layer-2 infrastructure, so how your devices are connected to each other and to be able to monitor that behavior as well.

Then, log data analysis. You're always going to have issues and you want to be told about when those issues arise, and then you want to have a solution in place that can help you troubleshoot those issues. So being able to analyze the logs from a solution, you can use a Syslog server or Splunk to get those logs for data analysis. You can use it to do some trend analysis, see how that device is behaving over the course of time, whether it needs to be upgraded or changed, or configuration needs to be worked on.

But having an understanding of how all the devices are behaving and then what kind of issues those devices are having and being able to sift through thousands and thousands of lines of machine data in a way that it makes sense to you. So again, we want to alleviate that time spent to do the troubleshooting to get things the way they need to be. So these are some really good features in terms of leveraging the right technology. Then again, once you have these devices and the right infrastructure in place, you want to optimize the devices to support your current recommended performance levels.

You want to be able to configure your devices, optimize your devices, and make sure that your devices are running the way you want them to, the way you need them to. If you're paying for a 100-megabit Internet line, then you want to make sure that you have infrastructure in place that can support a 100-megabit. And you want to make sure that if you're over-utilizing a 100-megabit, then you want to know why. Then if it's justified, you want to make sure that you scale up so you can maintain that performance. So again, you want to be able to optimize all your devices once you've discovered them, once you have provided the right directives in terms of what you want those devices to do. Now you can optimize them and make sure they're performing the way you want them to.

So why use Intermapper? A lot reasons. I think we just touched on being proactive when monitoring the health of your network. I think that's the number one key. Intermapper is a product that is very easy to administer and maintain. Again, there's a lot of capable products out there, but you have to ask yourself this important question: "Do I really want to spend months learning a product, learning how to administer and manage this product? Or do I want a product that's easy to use, where I can learn this product in a matter of days or weeks?"

We all know that time equals money. So the longer you spend on the product, the more time you spend on the product, the more money that's being pushed towards this product. As an IT administrator or an IT professional, we are looked at as not necessarily a revenue stream, but we're a cost center. So we cost our environment money, we cost our company money. So what we want to do, is we want to make sure that we minimize the amount of cost associated with providing our users, our company with the best possible, most optimized, best performing solution.

Low cost equals high rewards. Again, it’s a very competitive product in terms of our cost, in terms of our feature set, and the solutions that we provide; it’s a very powerful feature set. So you're getting more for what you're paying for. Again, this reduces the costs of network management, which is key when you're trying to get those recommendations for upgrades and those recommendations for forecasting IT budgets. You want to make sure that you're not absorbing a lot of cost.

Hit problems before they become critical. Part of proactive monitoring is if you're proactively watching and monitoring your network with a real time monitoring solution, you can see how these devices are behaving. If there are imminent failures, you can see that progression of the failure. So you can be proactive before it becomes a real critical issue, before those devices go down, before you lose internet connectivity and lose revenue. You can take those necessary steps to get in front of the issue before it becomes critical.

As I mentioned, it's very easy to use with other products. We play nicely with other products. Again, we can be a part of a solution that you already have in place. A lot of customers, they tend to look for an all-encompassing solution that does different things. So we can be a key cog in terms of your overall network management solution. We play nicely with other solutions as well. Some of our customers that use Intermapper, again, we have a lot of customers out there that love the product, that utilize the product, and use it for their network monitoring needs.

We've been very, very fortunate and very happy with the feedback that we get from our customers. Again, the feedback is very important to us. I consider this product a client-based product. The feedback from the customers is what makes the product as great as it can be. So we love the feedback. So I see some questions coming in. This is the time that I'll take some questions. Let's see. We have a question here for log analysis. "If you already have Splunk now, how does Intermapper interface with it?" Good question. InterMapper has an interface where we use the web services. So we have a web UI that's part of the product.

So what we do is we integrate using an API with Intermapper and Splunk. What that does is it provides you with a graphical interface within your Splunk Enterprise interface. So you can see your maps, the same maps that you see in Intermapper, you can see those maps in your Splunk Enterprise interface. So you can see the behaviors of the devices. You can click on the devices, and you can see if the devices are behaving badly or if they're in some alarm state. You can see some information in terms of what's going on with that device.

You can see why that device is in the alarm state and you can get some more detailed information in terms of the logs. It tells you exactly why that device is down or why that device is in some alarm state. So we have a graphical interface that's integrated with the product. So what you see in the Intermapper software, you also see it in the Splunk interface as well. We have another question. "How does mapping work for devices such as firewalls, which don't support CDP or other discovery protocols? Can you just manually add these devices and links to other devices?"

Good question. So the way Intermapper works, is Intermapper uses SNMP to do its scan. So if it's enabled SNMP, it will scan that device and add an SNMP probe, traffic probe, to that device. So in that sense, you can manually add devices with an IP-based software that’s vendor agnostic. So the device has an IP address, you can add that device to the map. And the probes, the probes are the backbone of the product. So the probes are what we use to capture information that's specific to that device.

So based on object identifiers associated with that specific device, you can capture very specific information from that particular device and use it as a way of keeping track of what's going on with that device. So again, we use SNMP. So if it does not support CDP, the device can still be monitored and you can still add that to the map. What we also provide you with is the ability to create manual links. So if the links themselves do not create automatically, if you know the relationship between the devices, you can manually attach those links.

Another question, "Can the traffic be split between groups, basically departments?" I'm assuming you're asking if you can set up different groups, different maps. I might need a little clarification on that question. A good question, "What amount, percentage of overhead, does Intermapper add to your devices being probed?" So as you know, SNMP is a very low touch protocol. The impact is little, if not minimal to none in terms of overall impact to your network. We have customers that have over 10,000 devices that they're polling at 30 seconds, and we have not had any issues in terms of network performance.

So again, very minimal, if any, impact to your network performance. This is a very good question. One thing I would throw out there to the group is that you want to identify what are your pain points? What do you feel are your network management pain points? Finding the right solution can help you solve those pain points. I think that's one way of determining what direction you need to go when you're vetting out possible solutions. We have another question. "Does Intermapper network monitor provide root cause analysis?"

So what Intermapper does, is Intermapper can provide you with some basic reporting. So you can report on devices, you can report on devices interfaces. If you want to do deeper analysis on the devices, you would have to use a third party software that hooks in with Intermapper, where Intermapper can send the information to that particular application. But in terms of a root cause analysis, we don't have that inherently in the product. "Does Intermapper work with virtual networks, VMWare, Hyper-V virtual switches?" Yes, VMWare, Hyper-V virtual switches, then desktop. Intermapper works with all the virtual applications out there.

We see the VM infrastructure has physical devices as well. So again, each one of these providers have MIB files associated with information that's important, or information that's specific to that vendor. We can use that MIB to create custom probes to probe those virtual environments. So we do support the virtualization in virtual networks. Great questions, guys. Hopefully I can get to all your questions. If I don't get to your questions, I'll try to respond to those questions.

If I take this offline, I'll get back to you with a response to those questions. Thanks again for joining me today. So again, please, when you get a chance, take advantage of our free trial of the software. Go to our website, download and install it. If you have any questions, we have a great support staff that can assist you and answer any questions along the way. I can sit here and tell you about the product, but the proof is in the product. So you want to take a look at the product, try it, demo it, see how it fits into your environment, and go from there. Thanks again, guys. 

 

Monitor Your Network with Intermapper

Network administrators love the visual, real-time maps and flexible performance monitoring Intermapper provides. Try it free for 30 days.