When an IBM i environment experiences rapid CPU consumption or other conditions that start to impact memory, there are some jobs that will make the list of usual suspects more often than not. While QZDASOINIT jobs almost always make the lineup, they are rarely solely to blame.
Often, they are simply guilty of keeping the wrong sort of company—it's usually the poorly-written SQL code running in them cause these issues, provoking QZDASOINIT jobs to gorge themselves on as much CPU as they can until they are identified and stopped!
So, what is the true nature of the QZDASOINIT jobs and how can we keep them from running wild in the system? QZDASOINIT is the job name for the database SQL server jobs. These jobs are used to serve SQL to JDBC and ODBC client applications and normally run in subsystem QUSRWRK. System i Navigator jobs also use this job name when running a query through the SQL window.
When CPU spikes on the system, it can be very difficult to determine which job or series of jobs are contributing to the problem since they all share the same name. Potentially, there could be hundreds of QZDASOINIT jobs that are collectively creating a big impact on CPU rather than a lone runaway culprit.
Get Visibility into QZDASOINIT Jobs
Administrators need visibility on the issue as a starting point. This can be achieved by running the command WRKACTJOB followed by manual batch investigation and resolution at the job level (repeating the process for each system). The information returned on this command still leaves important questions unanswered: Who is running these jobs? What proportion of overall CPU is being consumed?
Answering these questions requires a greater degree of insight to give more meaningful understanding and context to any issues for faster problem resolution. After all, the longer the investigation process takes, the more CPU is consumed. It’s clearly in everyone’s financial interests to resolve this type of problem quickly. To see just how expensive rogue jobs can be, read our job monitoring white paper.
With the appropriate real-time monitoring solution in place, administrators will have the ability to answer these questions. Consider the value a screen like the one below from Robot Monitor provides administrators dealing with QZDASOINIT job issues.
In this case, administrators have real-time visibility of a dedicated QZDASOINIT job's CPU and also immediate access to offending jobs for resolution. To accommodate the particulars of their environment and resources, they can also set threshold levels for early detection, effectively forewarning them of a potentially escalating situation before it ever has an opportunity to take hold.
Administrators set up this type of monitor by first creating a data definition that will be qualified by any or all systems, and by a particular job name. They can then choose to add custom thresholds to each monitor and issue proactive alerts when jobs exceed these thresholds. Within the data definitions, administrators can also select groups to add to these monitoring parameters.
An example of this would be for a group of programmers. This level of granular monitoring can be extended to include subsystem, accounting code, user, current user, job, and function. With this monitor in place, you gain proactive visibility into this group and its threshold in the context of total CPU being consumed by QZDASOINIT jobs and total system CPU consumption.
QZDASOINIT and Memory Issues
QZDASOINIT jobs can also be problematic where memory issues are concerned. In a typical example of this type of scenario, a batch job’s memory is flushed by interactive jobs, leaving the batch process to perpetually try to access jobs from the memory that are no longer there. This troublesome process is known as “thrashing”. The key challenge to resolving this lies in its identification, as the runaway process is most visible by its symptom of an increase in page faults.
As with our previous CPU example, the necessary investigation to determine which subsystem or jobs are being impacted by non-database page faults could be both lengthy and expensive without a real-time monitor. Administrators could first access the System Status screen to show the number of page faults in each memory pool, but would still be left wondering which jobs are responsible for causing problems in these memory pools, and which subsystem(s) are using these memory pools.
In tackling the issue, Robot Monitor employs the same data definition qualifications to create an appropriate monitor. It provides real-time visibility via a dedicated NDB bar showing the overall system faults/second and gives immediate access to the offending job for resolution. This monitor has all the same threshold and alert capabilities that will keep proactive administrators one step ahead of escalating resource issues.
The practice of managing by exception means jobs that are guilty of misbehaving have no opportunity to hide under a generic name shared by thousands like it—and remain under the radar until they are discovered. Instead, at the first sign of trouble, an alert is sent directly to the administrator who has all the information needed to resolve the issue—and help their business avoid a million dollar mistake.