Discussion:
Eternal WAIT on un-waited ECB
(too old to reply)
Steve Smith
2018-06-22 19:14:44 UTC
Permalink
Raw Message
I have a bad situation where a program is hanging forever in a wait. The
ECB shows x'30ABFC50'. The lower 3 bytes are the address of my PRB, and I
read somewhere that x'30' means it was un-waited (something like undead, I
guess). This happens after some turmoil, and it's probable, not yet
certain, the ECB looked like that when I waited on it.

This is actually a VSAM CHECK after an asynchronous POINT. Looking at the
S122 dump, the RPLACTIV flag is on, RPLFDBK is all 0.

My question is, would clearing the ECB before the asynch. request fix
this? I believe I ought to anyway, but my question is if this could cause
the hang?

I have no way to re-create the situation. Unfortunately, my customers
evidently do.
--
sas

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Christopher Y. Blaicher
2018-06-22 20:06:52 UTC
Permalink
Raw Message
At the start of everything, you should clear the ECB. Immediately after the WAIT you should clear it. OK, first pick up the value in the ECB in case it has significance and then clear it.
You should also look at the FAST POST and FAST WAIT examples in Appendix A of the POP manual. They can save substantial CPU time if you do this often.

Chris Blaicher
Technical Architect
Syncsort, Inc.

-----Original Message-----
From: IBM Mainframe Discussion List [mailto:IBM-***@LISTSERV.UA.EDU] On Behalf Of Steve Smith
Sent: Friday, June 22, 2018 3:15 PM
To: IBM-***@LISTSERV.UA.EDU
Subject: Eternal WAIT on un-waited ECB

I have a bad situation where a program is hanging forever in a wait. The ECB shows x'30ABFC50'. The lower 3 bytes are the address of my PRB, and I read somewhere that x'30' means it was un-waited (something like undead, I guess). This happens after some turmoil, and it's probable, not yet certain, the ECB looked like that when I waited on it.

This is actually a VSAM CHECK after an asynchronous POINT. Looking at the
S122 dump, the RPLACTIV flag is on, RPLFDBK is all 0.

My question is, would clearing the ECB before the asynch. request fix this? I believe I ought to anyway, but my question is if this could cause the hang?

I have no way to re-create the situation. Unfortunately, my customers evidently do.

--
sas

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions, send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Charles Mills
2018-06-22 21:16:17 UTC
Permalink
Raw Message
Examples are actually in Authorized Assembler Services Guide.

The fast POST example is unfortunately based on System 370 instructions.

I would say "clear the ECB sometime before either the WAIT or the POST is possible." Keep in mind that in some situations the POST might happen before the WAIT.

Charles


-----Original Message-----
From: IBM Mainframe Discussion List [mailto:IBM-***@LISTSERV.UA.EDU] On Behalf Of Christopher Y. Blaicher
Sent: Friday, June 22, 2018 1:07 PM
To: IBM-***@LISTSERV.UA.EDU
Subject: Re: Eternal WAIT on un-waited ECB

At the start of everything, you should clear the ECB. Immediately after the WAIT you should clear it. OK, first pick up the value in the ECB in case it has significance and then clear it.
You should also look at the FAST POST and FAST WAIT examples in Appendix A of the POP manual. They can save substantial CPU time if you do this often.

Chris Blaicher
Technical Architect
Syncsort, Inc.

-----Original Message-----
From: IBM Mainframe Discussion List [mailto:IBM-***@LISTSERV.UA.EDU] On Behalf Of Steve Smith
Sent: Friday, June 22, 2018 3:15 PM
To: IBM-***@LISTSERV.UA.EDU
Subject: Eternal WAIT on un-waited ECB

I have a bad situation where a program is hanging forever in a wait. The ECB shows x'30ABFC50'. The lower 3 bytes are the address of my PRB, and I read somewhere that x'30' means it was un-waited (something like undead, I guess). This happens after some turmoil, and it's probable, not yet certain, the ECB looked like that when I waited on it.

This is actually a VSAM CHECK after an asynchronous POINT. Looking at the
S122 dump, the RPLACTIV flag is on, RPLFDBK is all 0.

My question is, would clearing the ECB before the asynch. request fix this? I believe I ought to anyway, but my question is if this could cause the hang?

I have no way to re-create the situation. Unfortunately, my customers evidently do.

--
sas

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions, send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Seymour J Metz
2018-06-25 17:45:23 UTC
Permalink
Raw Message
Post by Charles Mills
The fast POST example is unfortunately based on System 370 instructions.
Why is that a problem? Wouldn't something like PLO be overkill?


--
Shmuel (Seymour J.) Metz
http://mason.gmu.edu/~smetz3

________________________________________
From: IBM Mainframe Discussion List <IBM-***@listserv.ua.edu> on behalf of Charles Mills <***@MCN.ORG>
Sent: Friday, June 22, 2018 5:16 PM
To: IBM-***@listserv.ua.edu
Subject: Re: Eternal WAIT on un-waited ECB

Examples are actually in Authorized Assembler Services Guide.

The fast POST example is unfortunately based on System 370 instructions.

I would say "clear the ECB sometime before either the WAIT or the POST is possible." Keep in mind that in some situations the POST might happen before the WAIT.

Charles


-----Original Message-----
From: IBM Mainframe Discussion List [mailto:IBM-***@LISTSERV.UA.EDU] On Behalf Of Christopher Y. Blaicher
Sent: Friday, June 22, 2018 1:07 PM
To: IBM-***@LISTSERV.UA.EDU
Subject: Re: Eternal WAIT on un-waited ECB

At the start of everything, you should clear the ECB. Immediately after the WAIT you should clear it. OK, first pick up the value in the ECB in case it has significance and then clear it.
You should also look at the FAST POST and FAST WAIT examples in Appendix A of the POP manual. They can save substantial CPU time if you do this often.

Chris Blaicher
Technical Architect
Syncsort, Inc.

-----Original Message-----
From: IBM Mainframe Discussion List [mailto:IBM-***@LISTSERV.UA.EDU] On Behalf Of Steve Smith
Sent: Friday, June 22, 2018 3:15 PM
To: IBM-***@LISTSERV.UA.EDU
Subject: Eternal WAIT on un-waited ECB

I have a bad situation where a program is hanging forever in a wait. The ECB shows x'30ABFC50'. The lower 3 bytes are the address of my PRB, and I read somewhere that x'30' means it was un-waited (something like undead, I guess). This happens after some turmoil, and it's probable, not yet certain, the ECB looked like that when I waited on it.

This is actually a VSAM CHECK after an asynchronous POINT. Looking at the
S122 dump, the RPLACTIV flag is on, RPLFDBK is all 0.

My question is, would clearing the ECB before the asynch. request fix this? I believe I ought to anyway, but my question is if this could cause the hang?

I have no way to re-create the situation. Unfortunately, my customers evidently do.

--
sas

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions, send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Charles Mills
2018-06-25 18:22:59 UTC
Permalink
Raw Message
Well, it's not a "problem" (FSVO "problem") but in an example that is
supposed to show the fast way of doing things, one might avoid slower
instructions, such as storage literal references, when alternatives like
TMLH and LLILF are now available.

Agreed, PLO would be a poor choice.

Charles


-----Original Message-----
From: IBM Mainframe Discussion List [mailto:IBM-***@LISTSERV.UA.EDU] On
Behalf Of Seymour J Metz
Sent: Monday, June 25, 2018 10:45 AM
To: IBM-***@LISTSERV.UA.EDU
Subject: Re: Eternal WAIT on un-waited ECB
Post by Charles Mills
The fast POST example is unfortunately based on System 370 instructions.
Why is that a problem? Wouldn't something like PLO be overkill?

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Jim Mulder
2018-06-22 21:17:16 UTC
Permalink
Raw Message
The x'30' means that the PRB was waiting on that ECB, but then there was

a Post-without-ECB that unwaited the ECB. RTM would do that in order to
ABTERM the
TCB (possibly for the CANCEL with DUMP 122 abend).

If the TCB is not running now, it should not be because it is waiting on
that ECB.
Not much more I can tell you without seeing the dump. Is the TCB set
nondispatchable? Has parallel detach gotten its fingers in there?

Jim Mulder z/OS Diagnosis, Design, Development, Test IBM Corp.
Poughkeepsie NY
Date: 06/22/2018 05:07 PM
Subject: Eternal WAIT on un-waited ECB
I have a bad situation where a program is hanging forever in a wait. The
ECB shows x'30ABFC50'. The lower 3 bytes are the address of my PRB, and I
read somewhere that x'30' means it was un-waited (something like undead, I
guess). This happens after some turmoil, and it's probable, not yet
certain, the ECB looked like that when I waited on it.
This is actually a VSAM CHECK after an asynchronous POINT. Looking at the
S122 dump, the RPLACTIV flag is on, RPLFDBK is all 0.
My question is, would clearing the ECB before the asynch. request fix
this? I believe I ought to anyway, but my question is if this could cause
the hang?
I have no way to re-create the situation. Unfortunately, my customers
evidently do.
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Steve Smith
2018-06-22 22:50:47 UTC
Permalink
Raw Message
Duh... the cancel almost certainly did the un-wait. It's easy to forget
that the train wreckage isn't exactly where the track broke.

That brings me back to trying to figure out why VSAM never POSTed me
(besides the fact it hates me). But clearing the ECB is a good start, and
maybe a STIMERM to stop the madness if that doesn't work.

sas
Post by Jim Mulder
The x'30' means that the PRB was waiting on that ECB, but then there was
a Post-without-ECB that unwaited the ECB. RTM would do that in order to
ABTERM the
TCB (possibly for the CANCEL with DUMP 122 abend).
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Walt Farrell
2018-06-25 21:25:49 UTC
Permalink
Raw Message
Post by Charles Mills
Well, it's not a "problem" (FSVO "problem") but in an example that is
supposed to show the fast way of doing things, one might avoid slower
instructions, such as storage literal references, when alternatives like
TMLH and LLILF are now available.
Even with the older instructions it will be faster than doing a real POST :)

(And if an application's performance characteristics are such that the difference in a few instructions in a fast-POST implementation are going to matter, should it really be using WAIT/POST, or something else altogether?)
--
Walt

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Charles Mills
2018-06-25 21:49:11 UTC
Permalink
Raw Message
Of course, but if I were going to give an example of the fast way of doing things I would give an example of the fastest way of doing things.

The example, of course, was written in the s/370 or s/390 era, so the use of storage instructions is understandable.

Sometimes also you do not have a choice of whether to use something faster: your "partner" in WAIT/POST might be fixed in stone.

Charles


-----Original Message-----
From: IBM Mainframe Discussion List [mailto:IBM-***@LISTSERV.UA.EDU] On Behalf Of Walt Farrell
Sent: Monday, June 25, 2018 2:26 PM
To: IBM-***@LISTSERV.UA.EDU
Subject: Re: Eternal WAIT on un-waited ECB
Post by Charles Mills
Well, it's not a "problem" (FSVO "problem") but in an example that is
supposed to show the fast way of doing things, one might avoid slower
instructions, such as storage literal references, when alternatives like
TMLH and LLILF are now available.
Even with the older instructions it will be faster than doing a real POST :)

(And if an application's performance characteristics are such that the difference in a few instructions in a fast-POST implementation are going to matter, should it really be using WAIT/POST, or something else altogether?)
--
Walt

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
John McKown
2018-06-26 13:21:40 UTC
Permalink
Raw Message
Post by Walt Farrell
Post by Charles Mills
Well, it's not a "problem" (FSVO "problem") but in an example that is
supposed to show the fast way of doing things, one might avoid slower
instructions, such as storage literal references, when alternatives like
TMLH and LLILF are now available.
Even with the older instructions it will be faster than doing a real POST :)
(And if an application's performance characteristics are such that the
difference in a few instructions in a fast-POST implementation are going to
matter, should it really be using WAIT/POST, or something else altogether?)
​Isn't IBM "pushing" SUSPEND/RESUME over WAIT/POST for application (not
system interface) code? ​
Post by Walt Farrell
--
Walt
--
There is no such thing as the Cloud. It is just somebody else’s computer.

Maranatha! <><
John McKown

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Don Poitras
2018-06-26 14:32:09 UTC
Permalink
Raw Message
Post by Walt Farrell
Post by Charles Mills
Well, it's not a "problem" (FSVO "problem") but in an example that is
supposed to show the fast way of doing things, one might avoid slower
instructions, such as storage literal references, when alternatives like
TMLH and LLILF are now available.
Even with the older instructions it will be faster than doing a real POST :)
(And if an application's performance characteristics are such that the
difference in a few instructions in a fast-POST implementation are going to
matter, should it really be using WAIT/POST, or something else altogether?)
???Isn't IBM "pushing" SUSPEND/RESUME over WAIT/POST for application (not
system interface) code? ???
Post by Walt Farrell
--
Walt
--
There is no such thing as the Cloud. It is just somebody else???s computer.
Maranatha! <><
John McKown
I think you're thinking of PAUSE/RELEASE. SUSPEND/RESUME requires sup
state/key 0. We haven't found PAUSE/RELEASE to be faster than WAIT/POST
and there are many API's that still use ECBs, so it's going to probably
be around forever.
--
Don Poitras - SAS Development - SAS Institute Inc. - SAS Campus Drive
***@sas.com (919) 531-5637 Cary, NC 27513

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
John McKown
2018-06-26 14:43:33 UTC
Permalink
Raw Message
In article <CAAJSdjioEkoLvsT9N+=
Post by Walt Farrell
Post by Charles Mills
Well, it's not a "problem" (FSVO "problem") but in an example that is
supposed to show the fast way of doing things, one might avoid slower
instructions, such as storage literal references, when alternatives
like
Post by Walt Farrell
Post by Charles Mills
TMLH and LLILF are now available.
Even with the older instructions it will be faster than doing a real
POST
Post by Walt Farrell
:)
(And if an application's performance characteristics are such that the
difference in a few instructions in a fast-POST implementation are
going to
Post by Walt Farrell
matter, should it really be using WAIT/POST, or something else
altogether?)
???Isn't IBM "pushing" SUSPEND/RESUME over WAIT/POST for application (not
system interface) code? ???
Post by Walt Farrell
--
Walt
--
There is no such thing as the Cloud. It is just somebody else???s
computer.
Maranatha! <><
John McKown
I think you're thinking of PAUSE/RELEASE. SUSPEND/RESUME requires sup
state/key 0. We haven't found PAUSE/RELEASE to be faster than WAIT/POST
and there are many API's that still use ECBs, so it's going to probably
be around forever.
​Yes, thanks for the correction. I must admit that I'm still a bit "fuzzy"
this morning. Likely because I went back to sleep around 04:30 instead of
getting up (05:25 alarm). That always messes me up.​
--
Don Poitras - SAS Development - SAS Institute Inc. - SAS Campus Drive
--
There is no such thing as the Cloud. It is just somebody else
​'
s computer.

Maranatha! <><
John McKown

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Steve Smith
2018-06-26 15:24:57 UTC
Permalink
Raw Message
Randomish thoughts:

1. PAUSE/RELEASE is imho, a rather overly complex interface. My current
policy is to only use it when both SRB and TCB code have to mingle.
Otherwise, WAIT/POST (although you should look at EVENTS), or
SUSPEND/RESUME in homogeneous environments.
2. It is a long-standing tradition for IBM's programming examples to be um,
primitive. But considering their purpose, using more modern instructions
may well confuse those who most need the examples. Once/if someone learns
NILH, TMLH then they know what to replace.

sas

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Ed Jaffe
2018-06-26 16:01:58 UTC
Permalink
Raw Message
Post by John McKown
​Isn't IBM "pushing" SUSPEND/RESUME over WAIT/POST for application (not
system interface) code?
Absolutely not! You *might* be thinking of Pause/Release...
--
Phoenix Software International
Edward E. Jaffe
831 Parkview Drive North
El Segundo, CA 90245
https://www.phoenixsoftware.com/

--------------------------------------------------------------------------------
This e-mail message, including any attachments, appended messages and the
information contained therein, is for the sole use of the intended
recipient(s). If you are not an intended recipient or have otherwise
received this email message in error, any use, dissemination, distribution,
review, storage or copying of this e-mail message and the information
contained therein is strictly prohibited. If you are not an intended
recipient, please contact the sender by reply e-mail and destroy all copies
of this email message and do not otherwise utilize or retain this email
message or any or all of the information contained therein. Although this
email message and any attachments or appended messages are believed to be
free of any virus or other defect that might affect any computer system into
which it is received and opened, it is the responsibility of the recipient
to ensure that it is virus free and no responsibility is accepted by the
sender for any loss or damage arising in any way from its opening or use.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Peter Relson
2018-06-26 12:30:33 UTC
Permalink
Raw Message
Post by Charles Mills
PLO would be a poor choice
It would be worse than "poor". It would be "wrong". PLO does not serialize
against uses of CS. The system uses CS to update the ECB.
If you were to use PLO, you would have no serialization against the system
update.

This is one of the significant drawbacks of PLO.

Now, if you were to simply test and update it within a constrained
transaction (TBEGINC), that's a different story.

Peter Relson
z/OS Core Technology Design


----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Peter Relson
2018-06-27 12:23:25 UTC
Permalink
Raw Message
You would not use pause/release for its individual-call performance
characteristics, but for other things.

For example, pause/release is not subject to the cross-memory post case
where the interface's being an "ASCB address" leaves things subject to
question if the target terminates and then a new space gets that same ASCB
address (hence, see the IEAMSXMP macro, where that service wrapper
basically schedules an SRB to do a non-XM post, after having validated the
target, since that service utilizes an stoken/ttoken rather than ASCB
address for identification).

Also, pause/release does not use the local lock, so it could help avoid
local lock contention which can significantly degrade performance. But it
is surely true that (with enough concurrent use) pause/release could run
into its own lock contention problems on the lock(s) that it does use.

It is certainly true that wait/post will be around for the life of the
operating system, regardless of what other options are provided.

Peter Relson
z/OS Core Technology Design


----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Loading...