Discussion:
Long execution & high CPU usage due to decimal overflow (PGM 00A) and large system trace tables
(too old to reply)
Allan Kielstra
2017-05-16 16:37:21 UTC
Permalink
Raw Message
Hi Peter

I have a suspicion that this is a mixed language program. That is, it consists of COBOL and, say C. (Or it uses COBOL features with a C run time implementation such as XML or OO COBOL.) Can you confirm this is the case?

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Peter Hunkeler
2017-05-16 18:52:23 UTC
Permalink
Raw Message
Post by Allan Kielstra
I have a suspicion that this is a mixed language program. That is, it consists of COBOL and, say C. (Or it uses COBOL features with a C run time implementation such as XML or OO COBOL.) Can you confirm this is the case?
Allan, I cannot confirm this because I did not investigate so far in this case, but it may well be that XML is being used. As far as I know, we do not make use of OO COBOL.


I can, however, confirm that we're facing a lot of troubles with programs after they get recompiled with COBOL V5.2 instead of COBOL V4.2. The reason being that the different instruction stream which COBOL V5.2 (and newer) generates increase the likelihood that the program may run into decimal overflow conditions.


A simple MOVE statements that moves A to B where A is declared as PIC 9(7)V9(4) and B is declared as PIC V9(18) had never caused a decimal overflow in COBOL V4 but may well in COBOL V5 because the later is making use of decimal instructions that will recognized decimal overflow.

We also see the decimal overflow mask bit in the PSW is turned on much more often, when COBOL V5 is being used. We know that using the XML is one reason, and running the prorgam under the IBM Debugger is another one, but we also see other cases, but did not find out what causes the mask to be set.


The result of all this is that a decimal overflow happens more often, and chances are much higher that the overflow will raise a 00a program check interruption that LE error handling has to deal with, just to find out the COBOL program can continue.


I just found another job late this afternoon which runs between 1 and 2 hours since the program was recompiled, but ran only a few minutes before.


--
Peter Hunkeler

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Allan Kielstra
2017-05-16 19:59:12 UTC
Permalink
Raw Message
It sounds like it's time to open a PMR. Do you want to do that and can you do that?

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Peter Hunkeler
2017-05-16 20:22:37 UTC
Permalink
Raw Message
Post by Allan Kielstra
It sounds like it's time to open a PMR. Do you want to do that and can you do that?
I'm not allowed to open a PMR by myself but asking my colleagues to open one is not a problem. It is only that I don't see what to report as an error in this case. From the long lasting investigation starting November 2016 to find a problem with Smart/Restart that started to pop up more often for exactly the same reason, I understand that the decimal overflow mask bit in the PSW and Language Environment's duty to support the different expectations of COBOL on one side and C and PL/I on the other side work as documented.



We had a PMR open with LE during the analysis of the Smart/Restart problem, and IBM LE did not seem to see a problem in the current behaviour.


What would you recommend we should report in the PMR? Maybe we continue to discuss offline, and report back here when there is interesting news?


Regards, Peter


--
Peter Hunkeler



----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Mike Schwab
2017-05-16 23:35:43 UTC
Permalink
Raw Message
Maybe IBM can publish the info or an II apar to document the
application coding techniques causing the problem.
Post by Peter Hunkeler
Post by Allan Kielstra
It sounds like it's time to open a PMR. Do you want to do that and can you do that?
I'm not allowed to open a PMR by myself but asking my colleagues to open one is not a problem. It is only that I don't see what to report as an error in this case. From the long lasting investigation starting November 2016 to find a problem with Smart/Restart that started to pop up more often for exactly the same reason, I understand that the decimal overflow mask bit in the PSW and Language Environment's duty to support the different expectations of COBOL on one side and C and PL/I on the other side work as documented.
We had a PMR open with LE during the analysis of the Smart/Restart problem, and IBM LE did not seem to see a problem in the current behaviour.
What would you recommend we should report in the PMR? Maybe we continue to discuss offline, and report back here when there is interesting news?
Regards, Peter
--
Peter Hunkeler
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
--
Mike A Schwab, Springfield IL USA
Where do Forest Rangers go to get away from it all?

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Paul Gilmartin
2017-05-16 23:54:28 UTC
Permalink
Raw Message
Post by Mike Schwab
Maybe IBM can publish the info or an II apar to document the
application coding techniques causing the problem.
If move-with-truncation is a conventional COBOL operation,
the resolution should not be, "Don't do that!"

I detest quiet truncation. It's hardly better if it takes an
inordinately long time.

Another possible resolution, with its own performance consequence,
is for the compiler-generated code to test (every time) whether
overflow is about to occur and handle it as a special case. It
might be better just to abandon the use of DFP.

-- gil

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Peter Hunkeler
2017-05-17 11:42:59 UTC
Permalink
Raw Message
Post by Paul Gilmartin
Post by Mike Schwab
Maybe IBM can publish the info or an II apar to document the
application coding techniques causing the problem.
If move-with-truncation is a conventional COBOL operation,
the resolution should not be, "Don't do that!"
I agree.
Post by Paul Gilmartin
I detest quiet truncation. It's hardly better if it takes an
inordinately long time.


I agree.
Post by Paul Gilmartin
Another possible resolution, with its own performance consequence,
is for the compiler-generated code to test (every time) whether
overflow is about to occur and handle it as a special case. It
might be better just to abandon the use of DFP.




Well, that was what Cobol up to V4 did: Not making use of modern, fast, cache protecting instructions. Cobol V5 change this, to the benefit in general, I would assume. It is only that truncation may cause pain. BTW, it is not only with DFP instructions, it is also with other decimal instructions such as SRP.


--
Peter Hunkeler



----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Tom Marchant
2017-05-17 11:49:23 UTC
Permalink
Raw Message
Post by Paul Gilmartin
It
might be better just to abandon the use of DFP.
I don't believe that this has anything to do with DFP.
The decimal overflow exception that Peter reports occurs with Packed
Decimal operations, not with Decimal Floating-point operations.
--
Tom Marchant

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Peter Hunkeler
2017-05-17 12:58:37 UTC
Permalink
Raw Message
Post by Tom Marchant
I don't believe that this has anything to do with DFP.
The decimal overflow exception that Peter reports occurs with Packed
Decimal operations, not with Decimal Floating-point operations.





Have a look at the DFP instructions. Some such as CZDT, and CZXT (Convert To Zoned) can indeed raise a decimal overflow exception. There probably are more.


--
Peter Hunkeler

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Tom Marchant
2017-05-17 13:36:26 UTC
Permalink
Raw Message
Post by Peter Hunkeler
Post by Tom Marchant
I don't believe that this has anything to do with DFP.
The decimal overflow exception that Peter reports occurs with Packed
Decimal operations, not with Decimal Floating-point operations.
Have a look at the DFP instructions. Some such as CZDT, and CZXT (Convert To Zoned)
can indeed raise a decimal overflow exception. There probably are more.
Yes, the Convert to Packed (CPDT and CPXT) and Convert to Zoned (CZDT and
CZXT) instructions can cause a decimal overflow exception. These are the result
of the packed or zoned decimal target and not a result of DFP operations.

These are the only DFP instructions that can cause a decimal overflow exception.
As you had noted, decimal overflow can also occur with packed decimal arithmetic.

The point I was trying to make is that abandoning the use of DFP would not solve
the problem that you experienced.
--
Tom Marchant

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Peter Hunkeler
2017-05-17 14:02:48 UTC
Permalink
Raw Message
Post by Tom Marchant
The point I was trying to make is that abandoning the use of DFP would not solve
the problem that you experienced.




Sorry for not getting that point.


--
Peter Hunkeler



----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Dale R. Smith
2017-05-17 16:27:41 UTC
Permalink
Raw Message
Post by Paul Gilmartin
Post by Mike Schwab
Maybe IBM can publish the info or an II apar to document the
application coding techniques causing the problem.
If move-with-truncation is a conventional COBOL operation,
the resolution should not be, "Don't do that!"
I detest quiet truncation. It's hardly better if it takes an
inordinately long time.
Another possible resolution, with its own performance consequence,
is for the compiler-generated code to test (every time) whether
overflow is about to occur and handle it as a special case. It
might be better just to abandon the use of DFP.
-- gil
There is a COBOL Compiler option, DIAGTRUNC, that will do the following:
DIAGTRUNC will issue a Warning message for MOVE statements to numeric fields when the receiving field has fewer integer positions than the sending field or literal. In statements that have multiple receiving fields, the message is issued separately for each field that could be truncated. The message is also issued for moves to numeric fields from alphanumeric fields or literals, except when the sending field is reference modified. Will find cases of ‘hidden’ loss of data when statements truncate numeric data items.

This is Compile time only, no affect on program execution.
--
Dale R. Smith

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@listserv.ua.edu with the message: INFO IBM-MAIN
Loading...