Discussion:
SQA shortage
(too old to reply)
Jeff Williams
2004-01-28 21:14:50 UTC
Permalink
One of our systems failed yesterday due to ECSA exhaustion brought on by ESQA overflowing. It's a dev system, so ops wasn't watching it, and it hung before anyone noticed. It's been pumping IRA103 messages for a while now, as ESQA is expanding at about 60 pages per hour.

I've spent the last few hours poking around trying to determine who/what is causing the expansion, and I'm not having much luck. Any suggestions as to specific things to look for/at ?

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html
Edward E. Jaffe
2004-01-28 21:27:31 UTC
Permalink
Post by Jeff Williams
One of our systems failed yesterday due to ECSA exhaustion brought on by ESQA overflowing. It's a dev system, so ops wasn't watching it, and it hung before anyone noticed. It's been pumping IRA103 messages for a while now, as ESQA is expanding at about 60 pages per hour.
I've spent the last few hours poking around trying to determine who/what is causing the expansion, and I'm not having much luck. Any suggestions as to specific things to look for/at ?
Do you have common storage tracking enabled in DIAGxx? To monitor this,
you need something like:

VSM TRACK CSA(ON) SQA(ON)

--
-----------------------------------------------------------------
| Edward E. Jaffe | |
| Mgr, Research & Development | ***@phoenixsoftware.com |
| Phoenix Software International | Tel: (310) 338-0400 x318 |
| 5200 W Century Blvd, Suite 800 | Fax: (310) 338-0801 |
| Los Angeles, CA 90045 | http://www.phoenixsoftware.com |
-----------------------------------------------------------------

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html
Imbriale, Donald , Exchange
2004-01-28 21:22:36 UTC
Permalink
If you have Rob Scott's MXI, go to CS and SORT E-SQA to find out who's
using it, then go to CSR and SORT E-SQA to find out who's not freeing
it.

Similar information can be obtained via RMF Monitor III.

Don Imbriale
-----Original Message-----
Behalf Of Jeff Williams
Sent: Wednesday, January 28, 2004 4:15 PM
Subject: SQA shortage
One of our systems failed yesterday due to ECSA exhaustion brought on
by ESQA
overflowing. It's a dev system, so ops wasn't watching it, and it hung
before
anyone noticed. It's been pumping IRA103 messages for a while now, as
ESQA is
expanding at about 60 pages per hour.
I've spent the last few hours poking around trying to determine
who/what is
causing the expansion, and I'm not having much luck. Any suggestions
as to
specific things to look for/at ?
***********************************************************************
Bear Stearns is not responsible for any recommendation, solicitation,
offer or agreement or any information about any transaction, customer
account or account activity contained in this communication.
***********************************************************************

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html
P***@ibm-main.aon.com
2004-01-28 21:31:12 UTC
Permalink
Or Mainview z/OS JCSA screen will show you who, by task, has what
SQA/ESQA....



Please respond to IBM Mainframe Discussion List <IBM-***@BAMA.UA.EDU>
Sent by: IBM Mainframe Discussion List <IBM-***@BAMA.UA.EDU>
To: IBM-***@BAMA.UA.EDU
cc:
Subject: Re: SQA shortage


If you have Rob Scott's MXI, go to CS and SORT E-SQA to find out who's
using it, then go to CSR and SORT E-SQA to find out who's not freeing
it.

Similar information can be obtained via RMF Monitor III.

Don Imbriale
***@bear.com

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html
Shane Ginnane
2004-01-28 21:41:38 UTC
Permalink
*EVERYBODY* has this as their default.
Ummm - unless of course you have an old member in the concatenation ....
Nah, that wouldn't happen.
You *NEED* this on - if it ain't, make it so.
Get a dump, get into IPCS and have a look at VSMDATA.
Barbara and others have posted about how to interrogate this previously.
Also IBMLink - info APAR II05506 will no doubt be useful.

Shane ...

--- Ed ---
Post by Jeff Williams
One of our systems failed yesterday due to ECSA exhaustion brought on by
ESQA overflowing. It's a dev system, so ops wasn't watching it, and it
hung before anyone noticed. It's been pumping IRA103 messages for a while
now, as ESQA is expanding at about 60 pages per hour.
Post by Jeff Williams
I've spent the last few hours poking around trying to determine who/what
is causing the expansion, and I'm not having much luck. Any suggestions as
to specific things to look for/at ?
Do you have common storage tracking enabled in DIAGxx? To monitor this,
you need something like:

VSM TRACK CSA(ON) SQA(ON)

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html
Bruce Hewson
2004-01-29 07:19:06 UTC
Permalink
Hello Jeff,

do you run "concurrent copy" backup jobs.

how much common space has been consumed by ANTMAIN.

We are still in discussions with IBM regarding such an incident that
occurred during Year-End processing.

The high level explanation: too many updates to tracks being backed up by
concurrent copy, the modified tracks are held in memory, with the pointers
to those tracks being held in common storage control blocks.

At the same time (parallel problem)we also had ANTMAIN use very high
numbers of local page slots, but that didnt destory us. The SQA shortage
did.

You can look at your EREP/LOGREC records.

Regards
Bruce Hewson

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html
Bruce Black
2004-01-29 14:57:37 UTC
Permalink
Post by Bruce Hewson
We are still in discussions with IBM regarding such an incident that
occurred during Year-End processing.
The high level explanation: too many updates to tracks being backed up by
concurrent copy, the modified tracks are held in memory, with the pointers
to those tracks being held in common storage control blocks.
At the same time (parallel problem)we also had ANTMAIN use very high
numbers of local page slots, but that didnt destory us. The SQA shortage
did.
Bruce, Since ANTMAIN is an address space, and it spawns data spaces to
hold the updated tracks that have not yet been requested by the dump, it
doesn't seem that it would need to put pointers to each such track in
SQA. It is possible, but I can't imagine why they would need to be in
common storage. I am sure that the address/data spaces consume some SQA
but probably not for that.

Searching IBMLINK, I find APAR OW42577 (03/2000) that added diagnostics
to the SDM for SQA shortages, but says that they are due to many
ATTENTION interrupts from the control unit when many tracks are being
updated (CC doesn't generate an attention per track, but high update
activity can cause many attentions). Then I found OW48788 (07/2001)
that fixes DASD attention processing so that many attentions will not
flood SQA. Even without it, the attention SQA is freed when the
interrupt has been processed, so the SQA would not increase and increase
over time, as Jeff reported.

Do you have OW48788 applied? What has IBM said about the problem?

--
Bruce A. Black
Senior Software Developer for FDR
Innovation Data Processing 973-890-7300
personal: ***@fdrinnovation.com
sales info: ***@fdrinnovation.com
tech support: ***@fdrinnovation.com
web: www.innovationdp.fdr.com

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html
Bruce Hewson
2004-01-30 06:14:54 UTC
Permalink
Hello Bruce,

Nope, ANTMAIN has lots of control blocks in common. I believe used in
communications with the DFDSS jobs doing the backups.

"The review of the dump sent showed many 8k blocks (5000) in the CSA which
were aligned to the copying of tracks that had been changed during a
concurrent copy operation"

IBM change team have given us a SLIP to try to gain more diagnostic data.

I believe they have opened two APARs in relation to our problem.....

lots of S878 abends recorded in LOGREC before the ANTMAIN address space
died.

The restart of ANTMAIN did not release/reuse previously allocated common
storage and left us walking the edge of common storage filling up totally.

We scheduled an IPL for the following weekend to reset the common areas.

Regards
Bruce Hewson

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html
Edward A. Gould
2004-01-30 14:43:11 UTC
Permalink
Bruce Hewson1/30/04 12:14 AM
Post by Bruce Hewson
Hello Bruce,
Nope, ANTMAIN has lots of control blocks in common. I believe used in
communications with the DFDSS jobs doing the backups.
"The review of the dump sent showed many 8k blocks (5000) in the CSA which
were aligned to the copying of tracks that had been changed during a
concurrent copy operation"
The next reasonable question is there propper documentation saying that CSA
would have to be increased by X (and a way to calculate X)?

Ed

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html
Jeff Williams
2004-01-30 15:13:56 UTC
Permalink
Thanks to those who replied. The culprit turned out to tbe the ftp server. A client had implemented a new program that connected to the server, issued a few commands and then disconnected. It was doing this every two seconds. It's not clear yet why the ftp server wasn't handling the disconnects, but in the interim I have disconnected the user instead.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Loading...