Security and Auditing with Your Enterprise Scheduler

On-Demand Webinar

Security and Auditing with Your Enterprise Scheduler

Windows, UNIX, Linux, AIX, Mac OSX

 

Ensuring and maintaining secure systems is a top priority in the IT industry. During this webinar, a Automate Schedule specialist will show you how to:

  • Create security roles for staff
  • Develop audit and history reports
  • Utilize schedulers without breaching security

You won't have to worry about the security of your scheduled processes anymore. Join us to learn more about the security and auditing features embedded within Automate Schedule.

 

So again, thank you for joining me. My name is Pat Cameron. And I've been with HelpSystems 15 years in September. I'm going to be having a party for that. And I've been with Skybot Software since its inception in about 2009. In my previous life, I was an operations manager at a hospital here near our Minnesota office. So I've used our HelpSystems products in the real world so I feel your pain.

And today, what we're going to be talking about are some of the ways that Skybot Scheduler can help you deal with audit requirements, help mitigate some of the risks in the data centers today, and then how we can help you deal with some of the security requirements that you have, and why you would use Skybot Scheduler to help deal with some of those issues along with your scheduling and operations.

A little bit of housekeeping over on the right hand side of the screen is a chat window. If you have any questions for me during today's presentation, feel free to send your questions in the box next to the send button and I will try to get to them. If I'm in the middle of speaking, I'll probably wait until I'm finished but we'll see how that goes. Also, I have a few slides to kind of go through an introduction of what we'll be talking about today and then we'll be going online and taking a look at Skybot Scheduler and some of the options that it has.

Difficulties with Auditing

So audits, sometimes they can be painful. How many of you have ever failed an audit? Raise your hand and nobody's going to know what your name is. So even if you put good policies in place and you have rules set up to meet all of the requirements for your auditors, you still may have a problem documenting those policies and showing the history of any exceptions. Unless you have a system in place that creates documentation as part of your automation tools, you still find yourself spending all kinds of time and energy trying to gather together all that information needed when the auditor show up. It may only be once or twice a year but who really has time for that.

Several bits of information that our customers have said are required for their audits are documentation of job failures when we're talking about job scheduling. And that documentation needs to include the date and the time, maybe the reason for that failure, and then how it was handled. So how and where do you keep that history? Is it easy for you to get your hands on it?

The other documentation that is required is a documentation of job streams and dependencies, especially if you've got cross-system or cross-application dependencies. Are you currently keeping track of those in an updated spreadsheet? That can take a lot of your time and energy that you don't have.


The Risk of Downtime

One of the risks that we all take when we work in the IT world is downtime. What are some of the causes for unplanned downtime? You always hear people talk about downtime from disasters or floods and fires and those are awful. But my experience has been that more often, the cause of downtime is internal. And it might not be an entire system but more like maybe one of the following:

When I was an operations manager, I did live through two systems that when casters up in the days before high availability solutions. In December of one year, one of our systems went down and February of the next, so it was a rough quarter. The systems were down for days but our backup in one of the points restored was successful. In this day and age, that certainly doesn't happen anymore. Now you've got high availability environments that make the likelihood of losing an entire system much less. But let's take a look at a few of some of the other downtime issues that you might have.

A programming error for example. A user considers the system down if their application is not available and if they can't access the data that they need to perform their job. The wrong version of a program getting put into production can cause downtime for one business unit or maybe for several. Because their applications are so interdependent these days, that downtime could really leak into other departments as well. At the lane of job stream is also considered downtime to anyone with service level agreements. If processes don't complete on time and if they in turn affect other production jobs, that's downtime.

Failure to meet SLAs is painful and can affect a company's bottom line as well as your own. If you've ever had to rerun jobs because the dependency wasn't available when that job ran, then you know what downtime is. Scheduling jobs based on time and hoping that all of those prerequisites are finished is a pretty scary way to run your information service department.

Delays can also be a cause because no one is notified of an error or if a job gets into a loop and starts running for hours. If you need to log into your systems at night or from month end and make sure that things are running okay, then you know what downtime is. Running jobs manually is also asking for problems because we're all so busy these days and stretched so thin, it's very possible that jobs will get run in the wrong order or they'll run at the wrong time.

Your operations people are only human. I know many operations areas that have checklists that they go through at each shift and check off jobs that were run and they put their initials there or maintain spreadsheets as run books. They're great to show the auditors but, in reality, you really don't have time to check as often as required or to be constantly updating those spreadsheets.

No one likes the micro manager and you don't want to have to micro manage your systems either. For performance, it can also be considered downtime if it causes delays in your jobstreams and stretches out that nightly batch into the morning business hours.

How Secure Is Your Schedule?

Security. How many of the users on your systems have administrator rights? Can you limit their access to jobs running on specific servers or can you limit them in certain applications? Do you have the flexibility to allow a production control area, for instance, be able to set up and change job schedules without having to give them administrator rights to those servers? Can you limit the type of access to specific jobs such as payroll jobs?

Maybe the operations area or the help desk can execute or restart those jobs but you don't want them to be able to create or make changes to them. Without the ability limit the type of access, you certainly may be putting yourself in a position to have a security breach within your data center. If someone from accounting needs to run some reports, ad hoc or periodically, can you allow them access to just those reports but limit them to other jobs that are on your system? These are the types of situations that can put you in danger of having someone breach your security policies.

Automating Processes to Improve Security and Auditing

So how can a tool like Skybot Scheduler help with those types of issues? Automation, high availability, security function, and auditing are all part of Skybot Scheduler. So let's take a look at how automation can help you manage security and audit requirements.

Automation mitigates the downtime and it does it in general because there's less chance of errors. If you've got your systems automated, you're not going to be running those jobs in the wrong order, you're not going to be running jobs before their prerequisites are met. And you also will have a much less chance of missing a service level agreement. So I would say start with a simple job first. Reports may be daily or weekly reports that aren't dependent on other systems or other interfaces or other jobs, get those automated first.

Another fairly simple process to automate is file transfers. Skybot software has a built-in FTP function and it can help you move those files from one server to another and it seems like there is a lot of file movement going on out there in the IT world these days.

Automating dependent jobs will definitely present errors such as jobs running out of order or running before their prerequisites are met. This is going to save you from having to rerun jobs that failed and it's also going to help you meet your service level agreements.

Backups and system maintenance, purchase, and archiving are all processes that must be automated and probably need to be done when production isn't active. The conditions needed for those processes are a little different than maybe a dependent jobstream. You may have to check to be sure that other processes are not active before running a backup or a purge. You don't want to take a chance of luck on some of those files. Workload automation tools like Skybot Scheduler can check for those types of dependencies as well assuring you that your backup will complete or your purge is actually deleting records.

Once you've got the day-to-day processes automated then it's time to look at month-end. Month-end always has dependencies across applications and maybe even across different servers. It's always complicated. Lay it out on a flowchart so that you can see the jobs and what their relationships are and then get to automating it. With the experience that you've gotten automating those easier things, month-end will be easier than it seems when you first look at it.

Event-driven schedules is another piece of being able to automate jobs to make sure that your jobs are running based on triggers or other job streams and they can be a lot of different types of triggers. Skybot supports file and directory events such as this example.

So someone places an order on your website, you want to make sure that that order gets processed immediately. The order get sent to the distribution center, a pick list gets created, and that order gets to the loading dock so that I can get my whatever it is that I've ordered early the next day, maybe Amazon's drones will drop it off for me. So I've got a couple different job streams that are depending on that file.

So along with the order processing, we also need to send that order over to the ERP system, update inventory, update sales, and then probably around the end of the day, we've got some daily reports that need to be distributed. So you can see that one event can trigger multiple processes. Some branches need to process immediately and some need to wait for other events or other tasks. The Skybot Scheduler supports all of these different types of events.

Monitoring By Exception

So once you've got your systems automated and all your different processes that are dependent running one after another, you also need to be notified if there's some kind of an exception. HelpSystems has always talked about managing your systems by their exceptions. If everything is running okay, so be it. I don't have to go and see all the checklists for it. But if there is some type of a problem, I need to be notified right away. So what Skybot Scheduler can do is notify you of failed jobs, send an email or text message to the help desk, interface with your help desk ticketing software and automatically open up a ticket so that somebody can have that error assigned to them, get the troubleshooting going, and get that problem solved before it turns into a bigger problem.

Skybot can also monitor jobs for late starts or overruns or underruns. So even if you don't get an error on a job, if it maybe gets in some type of a loop and just starts sitting and spinning, you need to be aware of that as well. So for your service level agreements, you can set thresholds of time. How long should this job run? What time does it need to be completed by so that I meet that SLA? So all of those different types of notifications will help you to meet those service level agreements and also make sure that you're made aware of those exceptions as soon as possible.

Sometimes people need to be made aware of other types of events besides job processing on their servers. If somebody in one of your departments needs to know when a file gets downloaded to one of your FTP servers, we've got automation built-in or notification built-in to Skybot. It can let you know of that immediately.

Also we can do some monitoring of the services and daemons that are running on your servers. So if you've got some critical processes that need to run and maybe they're kind of never ending, you can put an event monitor on it with Skybot and we can keep an eye on that. And if it does end unexpectedly, we can notify you and we can also trigger a job that will try to restart that service. So we can do some automation in addition to just monitoring your systems.

High Availability and Your Scheduler

High availability, like I said, "Boy, I really missed this high availability option back in the good old days when I was in the operations." Skybot has a high availability option and the way that it works is you install your Skybot server and you have it up and running. You install agent software on all of your agents and they will communicate with that production server. Then you can install the same Skybot software over on a standby server and run a command to put it in standby mode. What happens then is the post across data base automatically starts replicating from the master to that standby.

The replication takes place in real time so that you're always up to date. You'll have all of your history, all of your rules, all of your schedules, all of your configurations will exist over on this standby server. If something should happen then that communication is broken, you will receive a message from that standby server that says, "Do you want to bring this server up in production mode?" And so there's a command that you run so that you can bring that up in production mode. You won't lose any of your history, you won't lose any of your setup. All of the agents, they automatically then start reporting over to that new central server. You don't have to do anything up at agent level.

Now we didn't make that switch automatic. You do have to acknowledge it and run a command to make that switch because there could be a lot of reasons why this communication between these two servers is broken and we didn't want to take a chance on having both of these guys up in production mode. So you do need to run a command and you can do a role swap periodically to make sure that both of those servers are all up to date.

Security Options In Your Scheduler

Some of the security options that you need to look for in an enterprise scheduler are the ability to have role based security. You've got different groups of people and they have different responsibilities as far as what they need to do on those systems. Maybe they only need access to specific agents. If you run jobs for multiple clients, you want clients to only be able to see the agents that are part of their environment and also the jobs that are part of their environment.

So with Skybot, you can set up roles that will step up security so that they can have change, execute view or even be excluded from agents or jobs or any of the objects within Skybot. Those roles can be based on the department, the division, their location, or the customer, whatever their requirements are for their job. It's very easy and flexible to set up those roles.

Skybot also interfaces with Active Directory or an LDAP server. So your users can just use their network log in and password. You don't have to maintain any separate users for Skybot.

Auditing and Reporting Your Scheduling Activities

Skybot also does a great job of auditing all of the changes that are made to our database. It logs every database change, a deletion, a change, a creation, the date and the time, the user. And then also in that record, you'll see the original value. If it's an update, you'll see the original value and the new value. Skybot also keeps server history. We do a nice job of logging all of the changes to the server every time something runs and you can also report on all of those logs.

We include a number of reports within Skybot and we have commands that you can schedule those reports periodically and have them automatically email directly to your auditors. Typically between the audit report, the job history report, and the security report, all the information that you need there for your auditors would be on those three reports.

Security and Auditing Options in Skybot

So let's go ahead and go online and take a look at Skybot. We're going to take a look at some of the automation options, our high availability options, some of the security and then auditing functions as well. So I'm going to go ahead and bring up my desktop so that we can take a look at Skybot. Your screen should go green for a minute and the when it comes back, you'll be looking at one of my flowcharts. If you do have a question for me, if you put your cursor up at the top of the screen, just click on that chat icon and it will bring up that chat window for you and you can send a question while I'm talking as well.

So this is the interface for Skybot. As you can see, it's browser-based. For those of you that have not seen Skybot before, we've got drop down menu options for all of the different setup, all of the objects that you can create, all of the history, etc. And we'll take a look at a number of those today. There's a number of things that you can do to ease navigation. I've opened different tabs for each of the functions so I don't have to go back and drop those down. I want to make it really easy for you to navigate, especially if you're learning a new product.

Running Jobs on Multiple Servers

So this job stream runs jobs and tasks on a number of different servers. This one happens to be a Window server and it's running a Microsoft SQL job. When it completes, I'm going to run a job over on my Oracle server. I've got an E-Business job that's going to run. And when it completes, I'm going to run another Oracle job over on this Linux server and I'll also going to run a job over on one of my IBM i servers. So it doesn't really matter where those jobs are running, they can all be related to one another. You just need to build a list of what the prerequisites are. And so this Oracle job is going to run when this SQL job completes.

Now it's triggering this whole job stream is this file events that I've got over here. So this is a setup for an event that's going to watch for a new file to come in to one of my servers. So on the setup for this, I give it a name and it's just New File. This is the agent where that file will reside and this happens to be one of my FTP servers. So I've got people that are dropping files on to that FTP server and I'm going to monitor those directories and folders for new files. Here's one place where we've got notifications so we can send an email to a list of people or to individuals. We also have an interface to SNMP so we can send a trap maybe to an enterprise monitor or to your help desk software if you want to do that.

And then this just sits and wakes up every five seconds and look for a new file to get added into this CC files directory. And so as you can see, it's just going to sit there and monitor. I'll just show you the history quick. So again, we're going to keep history on all of that activity as well. So this is every time this file event gets triggered and this is the date and the time. You can see most of the time, it's in the morning but sometimes it's in the evening. And this is the actual name of the file that gets created. In here, you can see a file for every date. Every time I get a file, it has a date appended to it.

So as soon as that event occurs, then what I'm going to do is I'm going to run the SQL job. And if I go in and take a look at the detail of that, the first thing it's doing is using our FTP function to manage the file transfers. So you've got a list of different types of commands that you can run, anything you can run on the command line, you can run with Skybot. In addition to those kind of generic commands, we do have some interfaces to specific applications that you can use. But again keep in mind we can run anything that runs on a command line. But the first thing I'm doing is helping to automate those FTPs.

So as soon as that file comes into the server what I'm doing is I'm pulling it down to one of my window servers. This is the remote file name, the local agent filename and then I've got flags to set. If I'm going to replace it, delete the source and then the type of file transfer. So it makes it very easy to setup all your jobs, have them run after another, manage your file transfers within any of those jobs. It's just another process as far as Skybot is concerned.

Using Job Monitors

Now for monitoring of these jobs, here is where you can set up those job monitors that I talked about a little bit earlier. Here's where we've got a monitor for a maximum duration. So if this job runs longer than 30 minutes, I know that there is a problem. I need to be notified about it so I'm sending an email to the help desk. I could also put a time here so if you do have an SLA and this job needs to be completed by a certain time, we can notify you of that as well. We can notify and monitor for underruns and late starts. So those typically are going to cause you problems with meeting an SLA and so because of those gaps in the schedule that you might experience or delays, we want you to be notified of those right away.

And then the other notification that we can do is a status notification. So these are all the different types of statuses that a job gets when it runs and we're notifying one group when it's completed. They want to know when that job gets finished. And then we're notifying another group, the help desk, if there's a problem. So if the job fails, we're going to attach the job log and we're going to let the help desk know, probably open up a ticket as well. So automating and monitoring is going to help you keep on track with your audit requirements.

More on High Availability

Let's talk a little bit about high availability. So if I bring up our home screen. So this is the screen that you'll see when you log in. And I just bring this up because over here, it shows that we do have a standby server and it's up and active and that's a good thing. If this server goes down, I need to know about that as well as if my main server goes down, I need to know and bring up my standby. That standby server is I think it's on this tab. So this is the other server. So I've just got that task SQL where I'm running those SQL jobs, that's our backup Skybot Scheduler server as well.

So we've got it set up and it's replicating the data from our Skybot server. I can't log into it because Skybot is not really active. The standby to master connection is active. It tells me here I can't access that server and it also tells me down here where my replication is at. So if I did have a lot of records going through there and the replication got behind, it would show me where it was. It also shows me the version and then the hardware ID for this server. If we take a look at the system settings, so this is under Admin here. I can scroll down to the standby server and it will show me this is the IP address of that standby server and the port that we're communicating with. And then also, lucky me, I get notified if that standby server loses communication with my production server.

And we do swap that out every time we put on a product update which we just did last week. We do a role swap with our servers to make sure that everything is all good. So I did get some notification last week that that server was down. Also if you take a look at the licensing, you'll see here that we've got a primary license on our primary server. And then a standby license for the standby server. And then a trial license. So if you don't have a standby in your set up you won't see that standby. You'll just see the primary and the trial license.

So I've remoted into my Skybot server and I just wanted to show you the commands. So this is a Windows server so these are bat files and this is the master to standby. So if I did want to roll a swap, what I would do is I would run standby to master over on my standby server, bring that up in master mode and then I could swap over and change this from a master server to the standby. And I'll just show you a couple of things. I'm not going to do the swap but I'll just show you this is the command that it will run. So this is just a wizard that shows you that this is going to convert this from a master to a standby mode and start that replication. And the information that we need is the server address for the master, the host name or the IP address, the port number that we're going to communicate with, and then also for the admin user on your Skybot server, we need to use that password. So enter that information. It's going to run that command, which I'm not going to run right now, and it will swap out your servers. So it's pretty quick and painless process.

While we're here in the system settings, I'll show you a little bit about the authentication configuration. So we are using LDAP authentication with user authorization. We've got some options when you install Skybot, it will be Skybot authentication. But if you want to use your LDAP servers, it's very simple matter. This is our LDAP server, our active directory. This is the port we're communicating with. If you need to use SSL, you would just enter that here. And then this is the LDAP administrator and we can verify here that we are connecting to that LDAP server. Put in the default context for your different groups and then under Roles, you can create roles.

So now you've connected to that LDAP server. What you'll do is you'll create groups on that server and this is the context for that group. So this is help desk group over on my LDAP server and again, we've defined that group and then we point and put the users in it on the LDAP server. And now we're defining the type of access. You can see that they can change notification lists, they have view authority to all of the history functions but they are excluded from some functions, users and roles for example, the system settings and the import center.

And then we get a little more granular. So here we have secured actions. They can run reports but they are excluded from creating any type of objects on the system. And again, it's very easy to change any of those and then even down to the individual object level. So here's an example. So these are all the different objects that can be secured within Skybot. Jobs is one of those objects. And so here you can see at the highest level, the help desk can view jobs that are on the server but they can't make any changes.

They do have some jobs that they can execute. So here if they needed to restart or whole jobs that would be included in this execute function. So by clicking on this little green plus sign, it will bring up a list of all the jobs that are out there and here is where I can set the access for those individual jobs. It's going to default to this top level, let's say they can change those. So it can be very specific in the type of security and the type of access that groups have to objects within Skybot.

Auditing with Skybot

So for auditing, a couple of things. So let me show you. So if you need to move jobs or any other objects from a test environment for example to a production environment, you've got the different server. What you can do is we can do an export. So these are a couple of jobs that I've got that are set up and if I right click on them I can export these jobs.

So what Skybot will do it will go and it will look at all the different objects that those jobs are using. It will put them in a file. And then what I would do is I would move that file over to my production server and you can do that with a Skybot job that will monitor for that file and then pick it up and move it. And on this production server then if I go to the import center, we've got a couple of kinds of imports. But what you would use for moving from test to production is the Skybot Import. What I could do here is I would see a list of all of the files that contain the jobs that I want to move and then I can just click here import to Skybot.

Now I've got a bunch of duplicates out there because I've imported these jobs a number of times. But we'll import them on hold so that they don't start running right away or you can import them not on hold, whatever you want, that's kind of our default. And it will tell you, "I've got a duplicate here for the age and event monitor and the job but I'm going to go ahead and import these anyway and just say overwrite the duplicate objects." And Skybot will go out, it will read those jobs in that file, and it will create new ones. And if I go ahead and take a look at them, it's going to add a tag up here to those jobs that they were imported and this is the date and the time and it will put those new jobs in here for me, and they're on hold. So if you're managing your test and your production environment you want kind of a simple way to export and import jobs.

And then also when we do that import, let me show you what history looks like right now, audit history. So because Skybot monitors all of the changes to all of the jobs and all of the objects on the server, you can see here I deleted and then recreated jobs because I had duplicates out there. So it deleted this command. And so here's the job name, demo job two, the command that was deleted, and this is all the information about that command. And then it created one. And so this is the command that it created. Take a look at the job that it updated. So here's the demo job, this is the other one, this is the date and time. I did it. And here we have the original value and the new value.

With that, so you can view that audit information online or you can use one of our reports. And so for your audit history report, all of our reports have commands that you can use to schedule them. You can report on whatever relative duration that you want to report on. You can also automatically send the report to a certain individuals or a list of people or to individual users and you can also save that report and archive it. I've got a couple of jobs, let me go back to my jobs here that run those reports and I'll just show you. Let's go over this guy here. So here's my report job. I'll just show you a couple of those commands and then I'll show you a couple of reports.

So this guy runs everyday at 5:00 in the morning and it runs eight different commands and these are all different Skybot report commands. So this first one is the good morning report, it's a bat file because I'm on a Windows agent. I've got the end duration, the user name that I want to run this under and that's me, and I want to email it to the current user, yes. So all of those commands are in the Skybot directories and also if you hit any of these question marks, it will bring up our user guide. And under reports, we've got a section here for scheduling reports and this shows you what the commands are and what those different parameters are. So the help is a good thing.

Also if I go back to the screen, if I go up here I can go to My Files and this is where all my reports are going to be stored. So these are the reports that I ran earlier and if we take a look at that audit history report from earlier today, it will show me the time period and then it will show me all the details. So this is an agent environment that I changed, it was updated, I did it. I changed the tags and so this is the original value and then this is the new value. So all of this audit information will be available for your auditors and again, it's not a whole other documentation process that you need to keep. Job history is another report that typically is used for auditors. And this is going to show me, let's see, in here I just want to see the canceled and the error and the failed within that time period.

And so it will show me the run number for that job, the job name, the agent that it's running on, the date and the time, and then the status for that job. And then the last one that I want to show you is the security report because that's another one that auditors like. And I printed it just for the help desk and so it will show me this is the role. I only have one member in it and then these are all the different areas that are secured and it will show me the type of access that that user has. So again, with these reports you should be able to meet all of your audit requirements.

Conclusion

So that is Skybot. In some reasons, in some areas where it can help you with your security and audit requirements. Automation and schedule enterprise helps you to automate so that you can reduce the amount of downtime, reduce the number of errors. It allows you to centrally monitor across all of your different servers. You've got automatic notification put into a number of places so that you can be notified of any kind of exceptions. The auditing and reporting is going to help you meet those audit requirements. Role-based security with our interface to active directory or LDAP makes it easy to set up those groups and manage that security. And then our file transfer function is kind of just an extra that we threw in there just kind of make your life a little bit easier.

So if anyone has any questions, you can put them in the chat message. If not, that's all I had to talk about today. I will hang around here for just a minute and see if any questions come in but if not, I want to thank you for joining us today. And you are free to go and enjoy the rest of your day. Thank you again.