How Can Automate Schedule Help You Meet Service Level Agreements?

On-Demand Webinar

How Can Automate Schedule Help You Meet Service Level Agreements?

IBM i, Solaris, Windows, UNIX, Linux, Mac OSX

 

Whether your Service Level Agreement is with a customer, an internal business unit, or a vendor, keeping your end of the contract is vital to maintaining customer satisfaction and bringing success to your organization.

By automating the completion of key jobs, file transfers, and other routine processes, Automate Schedule reduces the amount of time your staff spends ensuring that SLAs are met.

In just 60 minutes, we will show you how to meet your organization’s SLAs using Automate Schedule's built-in features, including:

  • Job automation and monitoring
  • Job status notifications and notifications lists
  • SNMP and SMTP interfaces
  • Agent event notifications
  • Server and agent status notifications

When it comes to upholding service levels, your reputation is on the line. Reduce the risk of error by eliminating mindless manual tasks.

 

Pat: Thank you for joining us today. My name is Pat Cameron and today's subject is how can Skybot Scheduler help you meet Service Level Agreements. Like I said, my name is Pat Cameron and I'm with HelpSystems and Skybot software and I'm joined today by Keith Fuller. Keith is going to help out with our demonstration today. Good morning, Keith. I’m sorry I spoke over your “Hello”.

Keith: No worries.

Pat: Thanks for joining and thanks for helping out today, Keith. Today, we are going to talk about Skybot Scheduler and Service Level Agreements. Our agenda for today is talking about just what are Service Level Agreements. What do we mean when we're talking about those? And I believe that some people have formal and informal agreements. What do you need to monitor for your SLAs as far as job automation is concerned? And then how can Skybot be used to help you monitor for those SLAs so they don't miss any of those. So we'll have a few slides that we'll go through and we'll talk about those topics and then we're going to go online and Keith is going to show us a live demo of Skybot Scheduler and how it can help you meet those Service Level Agreements.

A little bit of housekeeping: over on the right-hand side of the screen, there is a chat window. If you want to send a question to either of us or both of us, you can send it. Just type it in the little box next to the send button over there. And you can send your questions to all attendees or send them to the host and presenter so that we both see them.

What Are Your Service Level Agreements?

All right. Well, let's talk about Service Level Agreements. So do you have Service Level Agreements with any of your clients? These clients can be outside the business or they can be inside. They can be other business divisions within your company. Are you accountable for making sure that business processes run on time and finish without any errors? Do you hate surprises when you come into the office in the morning and find such things as job failures or delays? These agreements may be formal or they may be informal.

In a previous life, I was an operations manager at a hospital here in the states for many years and our emergency room, of course, was open 24 hours a day. And so that meant that our lab and our pharmacy and our radiology systems, all of those departments were open 24 hours a day, seven days a week. And we had to have our online system up as much as possible so that they didn't have to use any manual processes for ordering tests for patients.

So the SLA that I had was informal. It wasn't anything that was written. But if there was any kind of downtime, we certainly heard about it. And so we wanted to make sure that our downtime was kept to a minimum. Our nightly batch and our reports were processed as quickly as possible and we got that information back out to the nursing units and to the departments that needed them.

So as I said, sometimes your Service Level Agreements can be with some of your internal customers, other departments that you're providing service for, or some of your vendors, or your own clients. And I'm sure if there are formal SLAs, formal Service Level Agreements, those are with your clients.

Understanding SLAs

What do we mean when we talk about Service Level Agreements? We want to make sure that processes finish on time. There might be different requirements for system uptime. And also keep in mind that uptime to users isn't necessarily that the system is up and running and available, but just my application, sometimes just my workstation. If my workstation isn't functioning properly, then I consider that application to be down, and that would be missing an SLA.

File movements are on time. It seems today, every customer that I talk to, they're moving files. Receiving them from vendors or from clients or from one application on one server, sending them to another application on another server for processing. So you want to make sure that that file movement occurs on time without any problems.

Business processes: you want to make sure that they start on time and that they finish on time. You don't want those business processes running into other applications and causing conflicts. Backups is another one. You want to make sure that those backups finish if you're running nightly backups to make sure that they finish so interactive users can access the system and not run into any file locks. Does your bonus depend on system uptime? I've heard that from a lot of customers as well. And again, any business processes running, you want to make sure that they run as quickly as possible so that you can move onto the next step in the process.

How Job Scheduling Helps You Meet SLAs

So how can Skybot Scheduler help with a number of those different SLAs that you might have? Built into the product are a number of features that you can use for monitoring your SLAs so that you don't have to have a checklist and get onto the system and check those manually. Skybot Scheduler includes an SNMP interface that allows you to set Skybot to receive SNMP traps and to act on those traps.

Also, Skybot can send SNMP traps to your enterprise monitor such as Tivoli, HP OpenView, Intermapper, and other enterprise monitors that we have. And you can send information from Skybot to those enterprise monitors so that you can make sure the people that are monitoring processes in the NOC are aware of any slowdowns.

Skybot also includes an SMTP interface that you can send emails and text messages as well to the system administrator based on how you've got your system configured, what it is that you need to be made aware of so that you can meet those SLAs. Within Skybot, you can set up lists of people to notify.

We can work with group email lists on your email server as well as setting up lists within the Skybot product itself, and that might include your vendor, if you need to notify a vendor of a problem with their application so that they can get onto the system and get that fixed. It can include your system administrators, the helpdesk, etc. So whoever it is that's supporting the process that's running, that's who you can send that notification directly to so you don't have to have a couple of different hoops to get to the right person.

Skybot Scheduler can also notify you of problems on the server or the agent that's running. And I'll show you a little bit about the architecture for Skybot and how the Scheduler is designed, and we can monitor Skybot Scheduler itself. Skybot has monitors that you can set up if jobs are late, if jobs are over-running. Maybe you have a process that tends to get into a loop; we can monitor for a maximum duration and then let you know when it meets that duration, as well as notifying you of the different statuses that a job gets, and we'll show you some good examples of that when we go online.

About Skybot Scheduler Enterprise Job Scheduling

So what is Skybot Scheduler? For those of you that have never worked with it or heard from it before, it is an enterprise scheduler and you can use it to do advanced scheduling of tasks or jobs. You can run batch files or scripts or system commands on a daily, weekly, monthly schedule. You can have an interval schedule. I need to run a process every hour—maybe some kind of an ERP process—every hour throughout the day.

Skybot Scheduler runs on any platform and you can set up event-driven scheduling across multiple platforms. So if I've got a process on one of my Linux servers, as soon as that process completes, I want to trigger something over on Windows. Or maybe I want to send a file over there. I can do that all within one Scheduler. I don't have to log in to multiple servers in order to monitor or set up those jobs.

The most common use of Skybot variables is to calculate dates. I need to run a report based on a range of dates. I always want it to be the previous month to today's date, so I can create variables that will calculate those dates for me, so that you can set those once and then let those jobs run every day.

Skybot also does return code testing. We want to make sure that the processes and the tasks that you run are successful. So we can scan for a return code or an exit code coming back from a process when it completes to determine if it's successful or not. If it's not, we do have the ability to do some retries, if that's possible, or we can fail the process and notify someone immediately. So that you're not going to be surprised when you come in the morning, we can let you know that evening if there's a problem.

Skybot Scheduler has role-based security included in the product. We want to make it easy to set up and maintain so we interface with Active Directory or LDAP server. So you can create a group over on your Active Directory server and then map that group to a role within Skybot so you can have one group of people that can actually change jobs and do the set up administrator-type.

Maybe the help desk can just view the jobs in the history and see how jobs are running. And then you might have another group, maybe in the operations area, that can actually execute jobs but they can't make any changes to the jobs. So you can be very specific in the type of access that people can have to those jobs. And then we also have job and server event monitoring, and we'll show you a couple examples of how can we use our monitors to trigger processes to run.

Skybot Scheduler's Architect

So this is what the architecture for Skybot looks like. It has a central server and this is indicated here. We install, and that can either be Windows or Linux or AIX, UNIX, so you install the server. We install a web server because the access is all through a browser. And then we also install a Postgres database, that's what we use on our backend.

So these three pieces are set up at the initial installation. It probably takes about five minutes to get that up and running. Once you have the central server installed and running, then you install an agent software, runs as a service or a daemon in the background on each of the VMs or partitions or servers that you want to be able to schedule jobs on.

So here's an example of a job stream that I might have on a Windows server. I've got a daily suite of jobs that runs every morning, and as soon as that completes, I want to run an ETL process over on my AIX server. So I can set this ETL process up to react to the completion of this daily suite. Once this ETL process completes, then I've got to run a file transfer from one of my Linux servers to an FTP server at one of my clients. And so it doesn't matter where that process is running, we can join those altogether in one job stream, run them smoothly and quickly. Most of the time what we've seen customers be able to compress the amount of time that it takes for those jobs streams to run.

So an example of an event-driven schedule might be a new file arrives on my FTP server. Maybe somebody's dropping files out there throughout the day, my customer is. And every time a file gets dropped on that FTP server, I need to wake up and then move that file within my network, move it from my FTP server that's outside the firewall, pull it inside to another server that I've got within my firewall in my network, and then I want to process that file. So these three jobs can be on three different servers, and one will trigger just one after another without any kind of delays.

I was talking to a customer yesterday that's just starting to install Skybot, and we were talking about the best way to run a number of these processes. And right now he has jobs that are scheduled to run at 6 a.m. and then he schedules the next step at 6:15, assuming that this first piece is going to be completed. If it's not, then this next job fails. But now with Skybot, he can be assured that that file's going to be there before this FTP job runs. So there will be a lot less errors for them and also their jobs are going to run a lot more quickly.

Enterprise Monitoring with Skybot Scheduler

Skybot can do some enterprise monitoring and to help you meet those SLAs. So we can receive a trap from a failed process, and then we can react to that trap, and that will trigger a process to try to restart the failed process or the failed job. And then as well as trying to restart it, we might be monitoring maybe a service over on one of your Windows servers. And if that service goes down, we want to send a trap based on that trap, react to it, try to restart the service as well as send a text message to the system administrators to let them know that we've run into a problem. Skybot is trying to fix it but if we're unable to, then you might have to get on and take a look, do a little bit more troubleshooting. But hopefully we can automate some of that troubleshooting for you.

For jobs or tasks that are scheduled within Skybot, we've got job monitors that you can set. We can monitor for job overruns. If a job runs longer than a threshold, let's say this job normally runs 15 minutes, so I would set a threshold of 30 minutes on that job and then if it gets to that 30 minute threshold, I know that there's a problem. It's in some kind of loop. It's waiting for something before it can continue processing. And then I need to get on and do something about it.

Or if you do have an SLA and this job needs to be completed by a specific time, I can use that job monitor to alert me before it reaches my SLA time so that I can get on and see what the problem is, why is this job delayed in completing today, and make sure that it completes on time.

We also have a monitor for an underrun. Not used as often as the overrun, but if you do have a job that if it runs less than a minute, even if it doesn't get any kind of an error, I know that's a problem because it didn't really do what it was supposed to. I guess backups are maybe a good example of that. If my backup only runs five minutes, I know that it wasn't able to get to all the objects that it needed to backup because normally my backup runs two hours. So I can put an underrun and I can be notified when that occurs. Maybe you have that backup restarted, see where the locks are, and get that running so that it completes on time and successfully.

We can also monitor for late starts. So if a job is scheduled at a specific time, but let's say it does have a prerequisite of a file, that job is not going to start at that scheduled time if that prerequisite file isn't available. So again, we can be notified immediately that this job didn't get started on time, what do you want to do about it? So we can notify on statuses such as late starts and overruns, and then we can also notify of statuses that a job gets.

Monitoring for Job Statuses

When a job is submitted by Skybot Scheduler, it goes through a number of different statuses. First, it gets submitted to a queue. It may be skipped based on some type of condition that you've set. And if you want to be notified of that, you can also set up that notification. We can also notify on a failure, a normal completion. Maybe somebody in the business office wants to know when this job is finished, we can send them an email and let them know or send them a text message.

Any of these statuses, we can send notification on. And again, typically we can send an email or a text message. Or sometimes with email or SMNP, we can interface with your help desk ticketing software, automatically open up a support ticket on a job failure. So we can take a couple of steps, again, to help you with the monitoring of the jobs and troubleshooting.

We can also notify on event statuses. So within the Skybot product, you can set up monitors to watch for file adds, file changes, and file deletions. If those types of events are triggering your job processes, you can set up a monitor that will watch for those. And then when they trigger, if somebody needs to know every time that file gets added to your FTP server, you can just send out a status. And all the detail of the name of the file, the date and the time, the server, etc., will be included in that message. So you can decide all of these different types of events that you can monitor on and whether you want to use SNMP or SMTP or both. So you have a lot of flexibility in how you can be notified of events within Skybot.

So we interface with these two protocols within the system setup in Skybot. We link to your SMTP server so you just have to tell Skybot where that server is if you need to authenticate to it, how you authenticate. We can receive SNMP traps from your network. You can set up Skybot as a trap manager. We have some setup in there with the IP address of the Skybot server.

We can also interface and send traps to your enterprise monitor. So you add Skybot as a trap manager to your monitor and we will send the traps over there. And like I said, SNMP or any of our events can trigger some type of an error recovery process. And also, we keep good history and you can report on when those traps are received or when those types of events occur.

Automatic Notification with Skybot

So along with notifying individuals, you can create lists of email addresses. They can be local addresses or remote. So if you do have vendors that support some of your products, you can put those vendor addresses right in your notification list. You can put phone number into those lists as well along with email addresses. Like I've got my phone in there, my phone number and then the vendor for messaging dotcom, and then I can get a text message. And that can be included on that list as well.

Once you've got the notification list set up, then you can add them to a job monitor, to late starts. You can add them to a status notification of failure, for example. You can add them to the event monitors, monitoring for files. And you can add them to a trap monitor. So you've got a lot of different places where you can get notification depending on what it is that you need to know and when you need to know it.

Along with the processes and the tasks that Skybot is running in those business processes, we also have a number of system messages that you might want to be notified of. If someone stops the Skybot server, you might want to let the system administrator know that. If the agent loses communication with the server, you can send out a notification of that. So along with your business processes that are running, you can also be notified of events that occur on the server or on your agents. So you can notify a business analyst of a file arrival. So a lot of different places where you can send out notification based on who needs to know and when they need to know it.

Configuring High Availability to Meet SLAs

As far as meeting your SLAs, Skybot also has a high availability option that you can configure with the Skybot server. So that's going to make sure that if something happens to your master server, you have coverage. So this is my production server and the Skybot jobs are running. All of my agents here are reporting and getting their rules for when to run, what the dependencies are, hosting history back to the server.

If you setup a standby server within Skybot, you configure that as a standby and put it in standby mode, and what happens is that PostgreSQL database automatically starts replicating over to that standby server. So now you've got a copy in real time of all of the rules that are setup for all the jobs, all of the agent setup, all of the history of all of the jobs that have run.

If this communication is broken, the standby server can be brought up into production mode. And then what will happen is those agents will continue to run jobs and they'll just start reporting their statuses over to the standby server. So you don't lose any time, you don't lose any history, you don't lose any job set up. Everything is done in real-time so you don't have to miss any of those SLAs even if you lose your master server.

Navigating Skybot Scheduler's Interface

So let's take a look at Skybot and how it can help you with these SLAs. I'm going to pass control over to Keith. Oh, how I hate giving up control. And Keith is going to show us the user interface for Skybot, how to create some of the job monitors and the job statuses, and how we can use the features within Skybot to help us with Service Level Agreements. So, Keith, if you want to take it away.

All right. Just a little housekeeping before you . . . Do you still have your phone on mute, Keith? If you put your cursor up at the top of the screen, now we're looking at Keith's screen, so if you have a question for us, put your cursor up at the top of the screen. Just click on the chat icon and that'll bring up the chat window for you, and then you can ask us any questions that you have. I'm not hearing any audio, Keith. Oh, it looks like you've dropped off. It looks like you need to dial in.

Keith: Is that any better?

Pat: Oh, there you are.

Keith: Yeah, that's better. Wonderful, I'm back. Sorry about that, everybody. Can you still hear me?

Pat: We can.

Keith: Yeah, wonderful. Okay. It keeps beeping in my ear.

Pat: Uh-oh.

Keith: Technology is a wonderful thing.

Pat: It is a wonderful thing, isn't it?

Keith: Okay, everybody. So this is our home screen within Skybot. So from here, you can get a quick overview of what's happening on your system. Is everything looking good? Are we running everything happily? Across the top, we have the current status. And so here we use what we call a traffic light system. So obviously green being good, everything is running as we expect. Yellow means there could be an issue. And red means something is not working so that may need some investigation. As Pat said earlier, we don't expect you to come to this screen to look for it. We can notify you by email, text message, SNMP, if these things happen. So you're not having to come to one screen just to find out that information.

Just below that, we then have job summaries. So this is what's happening on the system. So within the last six hours, I've initiated 130 jobs. I've got 14 jobs sitting on the job queue, two jobs currently running on my system, 106 have completed, and I've got 12 failures. So if this was a production environment, I'd probably need to look at that figure there because I've got 13 jobs that could be in problem, they could be affecting my SLA, and could cause problems further on down the line.

We also have actions required. So within here, we can see any problems that the server thinks you need to be aware of. So we’ve got pending messages. So here, we can see two days ago, I've got a message saying there were 20 missed jobs. And then I can then click on this link here which would take me across to the relevant screen. And once I'm happy that I've done that, I can just come in and acknowledge that system there. And to go with the SLAs, it will also tell you who's acknowledged it and what time they acknowledged it. So that can be used as evidence to say this person was made aware of this and we dealt with that problem from there.

So once you've done that, you can see the message has been updated. It's now gone away. There's nothing outstanding for me. On this system, we do have agents that are offline, but on a production system, that maybe something for you to worry about. In a short time, I'll show you how we can notify you if these agents go offline so you're not waiting for a job to try and run and fail before you find out that agent is having problems.

Then we've also got the list of the failed jobs. So up here, I've got failed jobs. And here, I give you the list of what those jobs are, giving you a quick information about what's happened with them. So Command Failed, it means there's something wrong with the job setup. If it's Failed-Initiated, it means there could be a problem with the actual agent, it's not actually getting that far as trying to run the command. And so we give you that.

Again, we have the clickable link at the bottom to take you to the relevant page. And then we also have Skipped Jobs so we can see that jobs have been skipped. Prerequisite Not Met, so that job was waiting for something to happen by a specific time. It didn't happen so we skipped the job saying we can't run because we haven't got a prerequisite. And again, we can notify you if that is the case. But some may be skipped for normal reasons. It may be that this job doesn't need to run today so we've skipped that job, and there's a number of reasons for that.

Key Features in Skybot Scheduler

Okay. So what I'm going to do now is just take you across the top of the screen, really show you a couple of sections that you may want to be interested in. So the first one being the admin section. So from here, we can do a lot of things like system setups, edit the users and roles, like hooking up to your Active Directory, bringing the users in, and then supplementing their access so they may be a system admin on your network, but within Skybot, you just want them to become a user. We can do all of that. We can limit what they can see very comprehensively so you only see what you want to see.

From here, we can also get to the Audit History so you can see what's happening on your system and see what changes are being made. So I'm just going to bring this down ever so slightly. There we go. Okay. So from here, I'm just going to go to the system settings. I've already got the page open just for a bit of fun. So within here, we’ve obviously got things like passwords so you can control how users can log on. Here, you can see I'm authenticating with my LDAP server so that's all configured within there.

Then we've got the system cleanup so we cleanup after ourselves and these can also be changed per job. So these are the system settings but they can be overwritten as well. Then we have Job Default Forecast. Audit History, we are tracking user changes, so that is on. And again, from the home screen, if that was switched off, you would see a red flag saying that it's not configured.

To send emails from Skybot, we've obviously hooked up to our SMTP server. We can send emails and this is the address you would get the emails from and it just gives you an idea of where that information is coming from, especially if you have multiple servers. So it could be production, Skype, or Scheduler server and test so you know what server it's coming from.

We can then see SNMP notification. So if I want to send SNMP traps out, I can add them into this section here. And as you can see, I can have more than one trap here. This one is actually my system that I'm working on today, so I'll show that shortly. Then we can also monitor for traps. So as Pat said, if you've got a third party application that wants to send us some information, we can do that. So we just switch the monitor on, set a Port number, and then you tell your other system what that Port number is, and obviously the address of the system, then we will start reacting off of those SNMP traps.

Again, we've got the standby server here so if we need to make any changes to the configuration, this can be done. But setting up the standby server is normally done by running a couple of script files or batch files, and it takes a couple of seconds to set that up. And then we also have Diagnostics. So if you do hit problems, you can get a lot more information from the Diagnostics, which helps us speed up the investigation, gets us to the bottom of the problem, and gets you up and running again as quickly as possible. So that is the section there. Then if I go the users section, so from here, we can have a number of users. I'm going to pick on myself today, give Pat a rest because I normally just pick on Pat's profile.

Pat: That's so nice of you, Keith.

Keith: I'm just being nice today.

Pat: Okay.

More About Skybot's Notification Features

Keith: Within here, if I click on Edit User, the part I want to show you here is this section, so Notify User of System Events. This will allow me to be told if, as Pat said, the agent is held, a job queue is held, an agent ends, things like that. You're notified so you don't have to wait for things to go wrong. We want to tell you if it’s active. It makes you a lot more confident that things are going to work when required. And as you can see here, I'm notified of the events and I'm picking up my email address. So this is actually brought in from the LDAP system because that's where my user profilers come from.

Okay. So that's the user’s section. Then we can come across to the agent side of things. So now, what I want to look at is . . . let me just find the right screen, that one. There we go. So these are the agents I've got on my system. And again, we’ve got the traffic light system. So coming to this screen, you can quickly see that the green ones are active, a red one is in a failed phase, there's something wrong with that agent. It hasn't been shut down quickly. The blue ones are inactive. So that has been shut down. So that's down for a specific reason.

Setting Up Agents

But we can actually do on here, again if I pick my system here, if I do Edit Agent on this screen, again, I just edit that screen there. And what we can do is offline notification. So within here, I can send an SNMP trap or I can send emails. And again, we can setup a notification list so it could go to your support center if they're remote or it can go to a specific user.

And what this means is if the agent goes offline unexpectedly, so if you shut the agent down, we're not going to send any notifications down because someone has taken that step to shut the agent down. But if the agent loses power, loses network connectivity, then we can notify you prior to a job trying to run on that system. And so it's all about being a half step ahead, checking that things are happening on there for you.

Okay. So that's really the setup of agents. And again it's very visible. You can change all of the descriptions. This a release of the software you're running. You’ve got job queues, active jobs, there's a lot of information on this screen for you. Okay. So what I'm going to do now is just go and have a look at couple of the jobs that we've got setup.

Managing By Exception

So within here and throughout the system I'm using a tag called SLA. So that's really to allow me to sort and get to the information I need to as quickly as possible. So within here, I'm just going to have a look at this SLA example job. So if I just right-click on the Job, Edit, and then Edit Job, this is the basic setup of a job. So general information about the job, what agent I want this to run on, the job name, description if I'm using any tags for that job, then we've got the schedule. So when do I want this job to run? So this is running daily at 12 o'clock. And so if I edit that, you can see it is running actually every day of the week at 12 o'clock.

If there's any exceptions, so do you want to only run during a specific time range, we can add that in as well. That allows you to have more useful interval jobs. So you want this job to run every 15 minutes between 9 and 5, that sort of approach. Then we have Agent Environments. So this is what we need to log on to be able to run the job. And here, you can see we're actually running this as the agent user. So that's a sexier way of running things. And then we come down to the Commands. And so here, I'm just doing a wait script for 600 seconds.

And then I've got the Logging section here. So I'm creating a job log and saving a job log for when this job runs. That'll come in handy a little bit later on when I start showing you the notification. So that's the job there. That's all ready to be run. So if I just type my SLA back in there, and you can see there, I've got a lot of jobs and now I'm just looking at the three jobs that I want to look at.

So if I now just go to job control, I'm just going to run this job now. But just before I do that, you can see just below, I've got a notification. This is my SNMP trap monitor. So I'm just going to quickly clear this list so we can see what's going to happen. So there we go. So if I just pop back to here now and I'm just going to run this job, so that job is now running. So in a second. I should start getting a notification because I've setup a job monitor on here.

So if I just go and show you that quickly now. If I go to Edit Job at Job Monitor, I've actually got a monitor here saying Maximum Duration 1 Minute. And if it runs for over a minute, which we know it will because it's running for 600 seconds, then it will send me an SNMP trap. It also sends an email to Pat Cameron, and there are some advance email notifications as well. So in here, I'm going to actually pick up the job name. It usually runs in less than a minute. And then I've got the scheduled time. So this is all the information I'm getting back from the Skybot system and I'm pushing it out to the email and also to the SNMP trap.

So I’ll just cancel that now. And we'll just have a look and see how that job is doing. So if I just right-click on this job, do job history, there we can see that that job is running. If I just give that a refresh. And here, you can actually see that my statuses are coming through to my SNMP trap. So I can see that the job went into a cancelled state here, and that will be due to the job monitor cancelling that job. Then at the Job Overrun Monitor, an event occurred here, later to be cancelled, and then the job's been cancelled. So it's giving me an update of what's happening on that job there.

Pat: We do have a lot of customers that use that interface for their help desk ticketing software. So it's pretty nice.

Keith: Yeah. So using the information that we can get from the job, we can actually populate certain fields within your ticketing system so you can easily track back and locate the job and the job run number, and specifically what you're looking for. Okay. So that's just showing you the SNMP traps, how they go through, what we can do with them on there.

The other thing we've got with the job as well, so if I just quickly go back to the job here, is we've got the status notification. So within here, as Pat said earlier, we can tell you when a job has been submitted, skipped, running, completed, failed, cancelled, or in a retry wait state. As you can see from this screen, we've got a failed status here and we've also got a cancelled status here.

So if I look at the failed status, we can see I’m sending an SNMP trap. I'm also sending an email to, obviously, Pat Cameron, also attaching a job log. So we're sending that information directly to you so you don't have to log on to the system to get that. It's in your email. So if you're on call, middle of the night, you've got that information on your phone, in front of you so you can make a quick decision whether you need to then go downstairs and log on to your system.

Pat: Yeah, it's really nice to get that log along with the error message.

Utilizing Job Logs

Keith: And in a second, I'll show you a copy of one of those job logs so you can see what information we explain there. Again, we've also got if that job's cancelled, then we can be told as well. So if I just actually go back to that job, and I just have a look at the job history again for that. And then from here, I can then just right-click on this job and you can see I can download the job log. And it's in a normal text format so it's very easy to look at.

Within the job log, we obviously get when the job was started, the job name and information, showing you what system it's running on, and then we get the certain steps of the job here. So we set a date for that job and we then ran that wait, exit script there and then the job is exited so the job's ended at that time. The process has been asked to terminate by the Job Monitor Overrun so we quickly know that that job hasn't ended in error. It's been cancelled by something that we setup within the product.

Okay. So the other things we can do is we can also look for changes in jobs. So we can set up agent event monitors here. So for this one, I've got a file event monitor and I'm looking for a file change. So if I just right-click here and edit the Event History, we can see that, again, just like a job we've got the general setup here. Again, we can put tags within here so we can sort this. We can also send in for emails, SNMP traps of when this occurs.

And then down at the bottom is the actual file event that I'm looking for. So here, I'm looking for file changed. We can look for files added, removed, or hitting a certain threshold. So here, I'm looking in a temporary directory for an SLA. But I'm actually looking on a completely different server here and so I'm looking on our Skybot server. I'm looking for if that file increases, decreases, or just is changed randomly. So if I now go into this system, this is the file I'm looking for. So I'm just going to come in here and just going to load up this screen here. So this is my File Event Monitor screen. So if I now come into here and I'll make a quick change in here, I'll just come down here. There we go, I can just put a bit of information.

Pat: You're so kind, Keith.

Keith: There we go. So I've now made that change. I'm just going to save. And it's not going to let me change the file. Get that saved. Then we just edit.

Pat: Maybe she's not so sunny.

Keith: Nope, it’s not really that sunny today, that’s what I was going to say.

Pat: There we go. I guess not.

Keith: Maybe just put one in there. Create a new file quickly. There we go. So I've actually created a new file in there but it should actually pick up that change as well. So you can see here that it has actually picked up my two changes that I've made to this file. And so you can see that the file has changed. And from this screen, you can actually get this information. The filename can actually be used as a variable and then be used as an FTP so we don't need to know the filename. We can actually search for wildcards and then bring in the filename. We can append a date to it, if required.

So if you have the same file coming in every day, we can take a copy of that file, move it to another location, but also be appending a date and a time on there as well. So it allows you to have multiple copies of files. But for that instance, we can see that I am actually tracking that that file is being changed there.

Pat: It's amazing to me how many production processes are based on file changes and file adds. I think that's just huge these days.

Keith: Indeed.

Pat: I don't know how many of you out there agree but I see a lot of that.

Keith: I remember the days when people used to have to actually, as you said earlier, had to put a buffer of time to allow for that file to be delayed slightly. With today's tools like Skybot, we can actually say as soon as that file's finished. We can even get to the point where we can read a specific word in that file and say that we know that file is now complete before we continue. So there's a lot of ways we can use that file to then start off another process.

Pat: Right.

The Importance of Audit History

Keith: So if I just go back to there. And again, we can look at Audit History here so we can see what's happening with this. Is it being changed? I can see Pat's been busy with this agent over the last few days. So we could see that it's been updated. So again, coming into here, this is all part of the Audit History now. I can see that the monitor Event Frequency was changed from 30 to five seconds and so that starts causing a problem. I know it worked with 30 so I can then just go and change it back. So it gives you the previous record of what was in there as well.

And again, I can just come to here and I can see multiple values have been changed from false to true. So again, if that's starting to cause me problems, I know what the original value was and I can change that back as well. So that's a good way of checking where things are happening. Okay. So we've done the sending of the event monitor. So we've changed a file, we've seen that trigger off. What we didn't show you was from that file, so if I just scroll back to the screen, if I do where it’s used, I can see when that happens. There aren’t actually any dependencies on this but there could have been a dependency. So when that file arrived, then it could've kicked off a job which then ran off so I could've watched that go through.

So if I just come back here, just go back to my job screen, and just scroll back on the page, I have this Restart Antivirus Process. And so if I just edit the job and look at the prerequisites for this job, I can see that this runs off of an Agent Antivirus Occurs. So that's another kind of event monitor that we've got. That's actually going to be a process one.

If I just go back to the events and just get rid of that for a second, and then I've got that process here. So for today, I'm just going to send an agent event. I'm going to make the system think that this has actually occurred. So I'm just going to push that through the system, and then if I just go back to the agent events, you can see that my antivirus event has picked that up. It's picked up the information that I've put into that record. And then from there, I can then now go back and look at my job history. I canceled that in a second, and I can see that this job would’ve run. Is that today?

Pat: It doesn't run very successfully but it does run.

Keith: So we can see that that has actually been triggered off as reactive. And then from there, I can actually just pop in and I could look at the job log, find out what's actually wrong with that job, correct it, and then, if required, I've got that restart option. So rather than having to go run the job again or send the file again, I can just do a restart and that allows the system to see that I've seen the error, I've been reactive, I've then corrected the problem, and I've restarted the process. So it allows you to assist that Audit History which then may keep your people that are monitoring your SLAs happy that you've seen the problem, dealt with the problem, corrected it, and continued. So it's a good way of processing through the system.

Pat: Right, it's great to have that history track.

Keith: Okay. So say within here, we've got all the history, and you could see in here we've got all of the event data, what the event type is, whether there's any notes that could be outside of the time range. So even if a process happens outside of its time range, we can still tell you that that happened. But we won't actually kick off the job afterwards because we were outside of that time range. But we do still track to show you that something happened because that allows you then to be able to edit your time range because you know that that's now going to happen a bit later and then capture that instance occurring.

The Value of Skybot's Reporting

Okay. The other good things we’ve got is we can do a number of reports. So from the reports, we've got a good morning report if you need to know what's happened over, for instance, the last 24 hours on your system. Then we can have that in an email on your desk first thing in the morning. So again, rather you have to come to log on to the system, we can tell you all about that information. It's in your email, reading it. If you're on the bus coming to work, you can read it on the bus. If you're in the car, we don't advise you reading it in the car.

Pat: Please don't read it in the car.

Keith: You can have it for when you get to work. I just have to play safe. You know what some people are like these days. So what I've got is I've actually ran a copy of this report earlier today. So we can see that I'm running this from 9 o'clock in the morning on the 24th up until 9 o'clock on the 25th. So what this report shows me is it gives me a quick overview like the home screen. So my Skybot Scheduler is running. I've got 12 agents offline so that would be something that I would need to look into.

And then we've got the Summary List. So I've run 321 jobs, 255 of those ended normally. Brilliant, I like that figure. Hang on a minute, I've got 36 here that have ended abnormally. So that means I've got some work to do coming into the morning. I need to find out what those jobs are. So if we just go down to the next page, we give you that information. We give you the job run number, what the job name is, and a brief indication of what happened with that job.

So again, Command Failed, it means there's something wrong with the job setup. Canceled due to job monitor, that's fine. So we know it tried to run too long, too short, or a late start. If it’s just held, then obviously we need to look at. Failed-Initiated means there was a problem with the agent. So we can start looking first at the agent rather than having to look at the job first. So it gives us a lot of information on that screen.

So it lists all of the abnormal jobs there. Then we come to Job Monitor. So we can see here what the Job Monitor's filed over the last 24 hours. And so here, we can see job overrun, job late starts, what actions were performed. So ending the job, sending emails, sending us an SNMP trap, gives us a lot of information so you can know that people have been notified. There are no missed jobs. So happy that we haven't missed anything. And these are the offline agents so it gives you a quick list of what's offline. So just a quick overview of how your system's running, ready for you when you come in the morning.

We've also get an Audit History report. So if you need to know what's happened on your system, then we can run the Audit History. If I just scroll up to the top of this report, I'm running this report from Monday, one minute past midnight up until about five minutes before we started this demo. So you can see, I've got 105 pages so I won't bore you. I won't go through the whole lot but I'll just show you what's in here.

As we showed you on the screen, we show you the original value and then we also show you the new value. So this is something that you could send automatically after your auditor to show you what's been happening on your system. Obviously on a production system, you hopefully wouldn't have 105 pages of changes going on on your system because obviously you want a production system to be as static as possible. But we give you all of the information that you need here, that you could want to look at.

So really that's covering what we can show you on the system. The last thing I really want to show you is the Dashboard, a lovely feature that a lot of customers use. It gives you a quick graphical view of what's happening on your system. And all of these can be clicked on. I'm not going to go into too much of this. But the terminated jobs on the 24th, I can pop into there, and it will take me straight to the 24th, and show me the jobs that I want to see within there. So it allows me to get to the right screen on the system fairly quickly.

And again, we have the Audit History Object Count here so I can come in, show all objects on the system. So you see that there are a lot of jobs that’s been changed recently over the last seven days. There are 161 jobs so I can click on that, find out what date all of those changes were. So the busiest day for us was on the 20th. So I can then click on the 20th to find out exactly what's happening on my system. And so in here, I can see again Pat's been a very busy person.

Pat: See how busy I am? Um-hum.

Keith: I think I'm going to have disable her profile because she's making a lot of changes.

Pat: It's taking a lot of work.

Keith: Or I can just take away the access for her to be able to do this information.

Pat: Testing, testing, testing, testing.

Keith: But I won't do that today because I'm in a good mood, Pat.

Pat: Thanks. Thanks. So nice of you.

Conclusion

Pat: Well, thank you so much. That was a great demo of Skybot. Any questions from anyone out in the audience?

Keith: I do see one here, Pat. It says, “Is the licensing model based around the number of agents?”

Pat: It is, exactly. There is a fee for the central server and then there is a fee for each agent. And as you can see from here, we do have some specific interfaces to certain applications. Some of these there's a fee and some there are not. I'm not sure exactly which ones, but yes. It's licensed by agent. Any other questions?

Keith: I don't see anyone. I can see . . . And you're welcome, Benjamin. Any time. If you want to get some more detailed pricing, and if you can just contact your local sales rep, they'd be able to give you all of the information that you need based on your systems so you get a more detailed pricing structure.

Pat: Right. And we do have a 30-day trial that you can download from our website. And our support staff would love to help you get it up and running and get it tested. Well, thank you for joining us. Thanks, Keith. That was great.

Keith: Thank you, Pat. Your bit was awesome as well, as always.

Pat: That's always good. Keith and I always have fun when we do the Keith and Pat show. So thanks for joining us, everyone. Have a great day and hopefully we'll see you next time.

Keith: Yeah, thank you, everybody.

Pat: Bye-bye now.

Keith: Thank you. Bye.