It's Friday, so I can (re)tell my war story. Shortly after z/OS R13 hit our first prod system, I noticed one morning that the system had been IPLed around 05:00. Everyone denied having done it. Then I discovered a fresh SAD taken around the same time. Sent if off to IBM. Next day or two, the same thing happened. The system was wait-stating after running clean out of storage frames! It made no sense. I posted the problem here.
Jim Mulder saw the thread and rang me up with a few questions. The failing system, unlike the sandbox, was being mirrored to the DR data center. All of it. Every single volume. Jim suspected and then confirmed that because of a change in R13, a failing page-in caused an I/O redrive, which lost track of the failing page request, which never got put back on the queue. With XRC active, some percentage of page-ins got tangled up with SDM I/O. More lost frames. Eventually MVS ran completely out of frames, and the system wait-stated. Auto SAD. Auto IPL. It was Development, so at 05:00, there were no user calls. Ops never noticed.
Jim fixed the problem immediately. I believe in auto IPL.
Southern California Edison Company
Electric Dragon Team Paddler
SHARE MVS Program Co-Manager
626-543-6132 Office ⇐=== NEW
From: IBM Mainframe Discussion List [mailto:IBM-***@LISTSERV.UA.EDU] On Behalf Of Mark Zelden
Sent: Friday, May 12, 2017 12:02 PM
Subject: (External):Re: AUTOIPL SADUMP LOADPARM flag value
Post by Jesse 1 Robinson
I'm curious as to why you do not want automatic reIPL after SADMP. Your
system is in a non-restartable wait state, after all. I view that as the
ultimate performance degradation. ;-) You have an SAD. If want to look at
it or at OPERLOG, you need at least one system in the sysplex up and
running. Why not this one?
IBM has recommended auto IPL for many years based on decades of problem
analysis. Nothing will ever get better on a dead system. ReIPL might fail,
but it's worth a try. You can also speed up SAD such that no operator
intervention is required. It's possible for a system to die, take an SAD,
and reIPL before the operator gets back from coffee break. I've seen it
If IBM-MAIN had a like button or thumbs up, you would have it. Haven't actually had
a crash in a long time, but the last time my client had one that was basically the scenario.
By the time I was getting instant messages and automated alerts / pages were going out to everyone, the system was already back up and had 100% application availability. It was something like a 10 minute outage total. The client wasn't happy, but it sure beats the heck out of "the old days" of initiating a stand alone dump manually and re-ipling after it completed. That's if an operator or even a sysprog could find the doc or knew how to do an SADUMP and do it correctly!
Mark Zelden - Zelden Consulting Services - z/OS, OS/390 and MVS ITIL v3 Foundation Certified mailto:***@mzelden.com Mark's MVS Utilities: http://www.mzelden.com/mvsutil.html
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN