Discussion:
Whacking a Job, or Getting rid of an Address Space
(too old to reply)
Sam Golob
2017-05-14 14:37:47 UTC
Permalink
Raw Message
Hi Folks,

Hopefully this info will help get you out of a jam sometime......

Sam

GETTING RID OF AN ADDRESS SPACE (or WHACKING A JOB)

In my career as a system doctor, I've had trouble, more than
once, in getting rid of an address space that was malfunctioning, and
starting over. Sometimes the address space was marked "NON-CANCELABLE"
and I've even seen address spaces marked "NON-FORCIBLE".

Mentioning this problem to fellow sysprogs, I've gotten answers
like: "You've got to learn how to use FORCE correctly." Or they'd say
some similar nonsense. Sometimes they're right. But a bunch of times,
there are a couple of bits in the way. And if you can't get past them,
you can't get rid of the job or other address space. I've seen this
situation force an IPL in the middle of the day. (NO GOOD....!!!)

So what do you do? There are two free APF-authorized TSO commands
which can help you.

One is called CSCF, and it is on CBT File 954. The other is
called CNCLPG, and it is on CBT File 826 (Updates Page). CSCF can

get rid of the main offending bits. CNCLPG (with the KILL option)

can do that, and then whack the job or address space.

Both of these commands do multiple functions. But to get rid of
a job or system task, you first need to change its status to CANCELABLE
or FORCIBLE, and then you need to CANCEL it or FORCE it. Sometimes,
you can just "whack it". To do so, use the KILL subcommand of the
CNCLPG command (Updates page of www.cbttape.org).

The KILL subcommand of CNCLPG will do a CALLRTM TYPE=MEMTERM
operation on the address space, but before it does so, it turns off the
ASCBNOMT and ASCBNOMD bits in the ASCB. ASCBNOMT is what makes a job
"NON-FORCIBLE", and ASCBNOMD off, makes it FORCIBLE even if the error
was a DAT error. THEN the KILL subcommand does the CALLRTM MEMTERM.
In that way, KILL makes sure that nothing will get in the way of the
"FORCE" operation, and the address space will be duly "whacked". Then
you can start it over.

One note of caution: You have to whack or alter he correct
address space. If you don't, you can cause havoc.

WHY? Both CNCLPG and CSCF have to run the CSCB chain. This is
a chain representing all the active jobs, system tasks, and TSU's in
the system. Sometimes there are many address spaces with the SAME
name. And there can be more than one address space with the SAME
ASID (I bet you didn't know that). So in order to make sure you are
altering the correct address space, you have to specify BOTH the ASID
and the JOBNAME when you run CNCLPG.

How do you get that information in the first place?

Run CNCLPG with the DISPLAY command.

The DISPLAY command will show all matches and all occurrences.

So if you run CNCLPG jobname DISP, you will see all the CSCB
entries matching your jobname, and you can specify the one with the
correct ASID by using the ASID(hex) parameter together with the
jobname parameter.

Do this first, and you won't be sorry later. Do DISP several
times, until you see only one entry--the entry that you want to alter.

Best of everything. Use this in good health......

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Peter Relson
2017-05-16 11:59:32 UTC
Permalink
Raw Message
Maybe it's me, but I found this post kind of inappropriate since it came
without caveats. One might think/hope that whoever defined a space as
non-cancelale or non-memtermable had a legitimate reason for doing so.
That likely isn't of course always true, but isn't that what you really
need to assume?

Unless you are willing to risk your system and its data by assuming that
it is OK to cancel something that is non-cancelable or memterm something
that is non-memtermable then the action taken by this tool is
inappropriate.

And by including "and its data" I mean to include that you could
conceivably break some data that you won't be able to fix by re-IPL. Not
likely, but conceivable.

Peter Relson
z/OS Core Technology Design


----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Paul Gilmartin
2017-05-16 14:46:19 UTC
Permalink
Raw Message
Post by Peter Relson
Maybe it's me, but I found this post kind of inappropriate since it came
without caveats. One might think/hope that whoever defined a space as
non-cancelale or non-memtermable had a legitimate reason for doing so.
That likely isn't of course always true, but isn't that what you really
need to assume?
I sense a gradual escalation here.

Long ago, there was the CANCEL command so operators could
terminate troublesome jobs.

But a designer felt that sometimes the programmer knows better,
and provided the non-cancellable attribute.

Then a designer felt that sometimes the operator knows even better
and provided the FORCE command.

Then a designer felt that sometimes the programmer knows even better
and provided the non-forcible attribute.

Now someone feels that operators know better and is providing
a WHACK facility.
...

Perhaps there should be a numeric attribute and a CANCEL command
argument, such that if the value supplied by the operator exceeds the
program's attribute, the CANCEL just works.

Floating point, of course. Decimal floating point.

The operator will always have the nuclear option.

-- gil

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Elardus Engelbrecht
2017-05-16 15:31:42 UTC
Permalink
Raw Message
Post by Paul Gilmartin
I sense a gradual escalation here.
Long ago, there was the CANCEL command so operators could terminate troublesome jobs.
But a designer felt that sometimes the programmer knows better, and provided the non-cancellable attribute.
Then a designer felt that sometimes the operator knows even better and provided the FORCE command.
Then a designer felt that sometimes the programmer knows even better and provided the non-forcible attribute.
Now someone feels that operators know better and is providing a WHACK facility.
Not good. All involved must look why you need to WHACK it.

For example, your job itself is waiting for a mount or is waiting for HSM to recall something, but there is a mount problem. Solve that, and you don't need all those fancy measures including a 222 abend.

Ok, that is just one sample reason why 'WHACKING' or all those 'x who knows better' are not suitable.

One example we got a few weeks ago was, a session was holding a CICS region. CPU% and region consumed climbed. Response times dropped on all CICS regions. Instead having WHACK down the troublesome CICS region, the network people simply VARY that session offline and all things returned to normal. No STCs were stopped at all.

But, so, normally check all normal avenues, then escalate using more and more extreme measures as per Paul' suggestion.

Groete / Greetings
Elardus Engelbrecht

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Sam Golob
2017-05-16 16:25:54 UTC
Permalink
Raw Message
Hi Folks,

I just want to tell you that I very much appreciate this
discussion. Very much depends on POINT OF VIEW, and when all the points
of view get together, real progress is made, and everybody becomes wiser.

There are at least three separate points of view here on this forum:

1. The systems programmers who have to set up and run the system
software in data centers.

2. The professional "system level" programmers who usually work for
vendors.

3. The IBM programmers who design and build the system software.

There may be other people here also, such as application
programmers and "programmer toolmakers" and more types, as well.

Everybody has a separate point of view. In summary, here they are:

People who run data centers, have to make sure everything run
smoothly, and they have to deal with "the problems of the non-ideal
world". Something breaks--fix it. Keep the system up. Make sure the
system levels are correctly set for what we are doing, and for what we need.

Professional "system level" programmers dig deep into the system.
"Authority" is not what is usually on their mind, unless they are
dealing with a security-related product. For example, doing
cross-memory programming is usually "a piece of cake" for them. But
changing some fields in another user's control blocks, which might be
easy for THEM to do, is a nightmare from the system administrator's
point of view, so you already see a difference in point of view between
these two groups.

Finally, the IBM designers and programmers have a big
responsibility of delivering a consistent and reliable system, but they
may tend (depending on the individual person's actual experience) to be
a bit separated from the system programmer's "real world" problems, and
the things that actually come up in a real data center, day by day.

I am glad that my post is bringing these 3 points of view together,
in a productive and fruitful way. If I did not write about this topic,
then some sysprog might be without a necessary tool in his/her toolbox.
When the emergency came up, they would be as helpless as I was, many
years ago. On the other hand, we know that the tool can be used
improperly, either by the right people or the wrong people. So I had a
quandary: "To say, or NOT to say. That was the question."

I opted to "say". I remember the pain in my heart, when JES2
couldn't be removed, and we had to IPL in the middle of the day. It was
easy to fix if we could just cancel JES2, and restart it. I had already
proven that to myself, at that time. But I was helpless and adrift. We
had to IPL. NEVER AGAIN! I won't let that happen to someone! NEVER....!!!

So there. I trust we've all been helpful.......

All the best of everything to everyone.

Sincerely, Sam

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Sam Golob
2017-05-16 18:58:31 UTC
Permalink
Raw Message
Hi Folks,

I want to point out that not all versions of CNCLPG on CBT File 826
have BURN or KILL capability for JOBs or STCs. The earlier versions of
the program (included in the file) have less power.

You have the choice of installing one of the earlier versions of
the program (1.10, 1.11, or 1.20) if you only want to make jobs
non-cancelable or non-swappable (1.11 and 1.20). So then you can use one
of the earlier versions.

My point is to let you know that in an emergency, the power is there.

All the best of everything to all of you.

Sincerely, Sam

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Loading...