TRSMAIN question

Discussion:

TRSMAIN question

(too old to reply)

McKown, John

2007-01-30 21:35:20 UTC

Is there any documentation on the compression algorithm used by TRSMAIN?
Or how effective it is? I.e. if I have 21 MEDIA2 (3490E) tapes worth of
printable data, can I estimate how many compressed tapes that will take?
And is there anyway to ftp that to an ASCII based server and uncompress
it? Yes - this relates to my previous question about RACF IRRADU00
reformatted records.

--
John McKown
Senior Systems Programmer
HealthMarkets
Keeping the Promise of Affordable Coverage
Administrative Services Group
Information Technology

The information contained in this e-mail message may be privileged
and/or confidential. It is for intended addressee(s) only. If you are
not the intended recipient, you are hereby notified that any disclosure,
reproduction, distribution or other use of this communication is
strictly prohibited and could, in certain circumstances, be a criminal
offense. If you have received this e-mail in error, please notify the
sender by reply and delete this message without copying or disclosing
it.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Barry Merrill

2007-01-30 22:12:47 UTC

Permalink

1. We create both Windows Zipped and z/OS Tersed distribution files for
MXG Software, which is a single sequential pure text file, currently
2,119,181 lines of text; the lines are FB 80 on z/OS, but are not
numbered, so the file is smaller as a variable-length ASCII file.

Our current version's stored sizes are:

Size of FB 80 EBCDIC file, z/OS 169,534,800 bytes
Size of PC ASCII variable length 104,353,987 bytes

Zipped PC file 17,589,006 bytes
Tersed FB 80 21,653,504 bytes

Terse reduced the z/OS file by a factor of 7.82.
Zip reduced the ASCII file by a factor of 5.93.

But, the 8-bit z/OS file is 62% larger than the ASCII file; not
only is there the 8-bit EBDCIC vs 7-bit ASCII, but the ASCII file
lines are the actual length of text, while each line of the z/OS
file is 80 bytes long.

But the 169:21 reduction, almost 8:1 reduction of the 80-byte EBCDIC
text to its TERSEd equivalent is very consistent with my experience
with not only text files, but also z/OS customer's SMF data files.

2. For Windows-to-Windows ftp with compression, we use Serv-U as our ftp server
and Voyager ftp clients, and consistently see the same 8:1 compression,
i.e. reduced transfer time to 1/8th; sure would be nice if z/OS ftp programs
would support Serv-U's compression for our customer's ftp of our product.

But even without compression, with a single T1, it only takes 15 minutes to
download the 160MB file, which is a whole lot faster than even overnight shipment.

3. Unfortunately, after writing paragraph 2, I realize you want to unterse
on the PC, so that information is of no use to you, (but, having been
written it's still worth sharing that experience with this august group).
I'm not aware of any un-Terse on ASCII platforms, so I still have to ship
customer's Tersed datas to a z/OS box for Un-tersing.

Barry Merrill

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

McKown, John

2007-01-31 14:10:07 UTC

Permalink

-----Original Message-----
From: IBM Mainframe Discussion List
Sent: Wednesday, January 31, 2007 3:36 AM
Subject: Re: TRSMAIN question
On Tue, 30 Jan 2007 15:35:07 -0600, McKown, John

Post by McKown, John

Post by McKown, John
And is there anyway to ftp that to an ASCII based server

and uncompress

Post by McKown, John
it? Yes - this relates to my previous question about RACF IRRADU00
reformatted records.

John,
I suggest you convert the tapes to AWS, ftp to a PC and zip
them there.
You may use GZIP on the mainframe to reduce the network load,
which I guess
was your original intention. If you are not bothered about
the network but
instead about storage costs, go with WinZip and copy the
images onto a DVD.
Better yet, convert to HET and send to a PC running MVS3.8
under Hercules
to be printed.
Dave
(Super post from Barry Merril, as usual).

The reason to compress on the mainframe was to reduce the time needed to
ftp. Trying to ftp 21 MEDIA2 tapes (3490E) worth of data to my PC (over
100Mb ethernet) scares me. I was hoping that since IRRADU00 data is all
character that it would compress very effectively and that I would,
overall, save time. Likely a vain hope.

--
John McKown
Senior Systems Programmer
HealthMarkets
Keeping the Promise of Affordable Coverage
Administrative Services Group
Information Technology

The information contained in this e-mail message may be privileged
and/or confidential. It is for intended addressee(s) only. If you are
not the intended recipient, you are hereby notified that any disclosure,
reproduction, distribution or other use of this communication is
strictly prohibited and could, in certain circumstances, be a criminal
offense. If you have received this e-mail in error, please notify the
sender by reply and delete this message without copying or disclosing
it.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Tim Hare

2007-01-31 14:49:53 UTC

Permalink

I believe TRSMAIN uses an LZ (Lempel-Ziv?) or LZW (add Welch) algorithm
of sorts, but of course the algorithm matters less than the archive format
in your case.

IBM's Unix Tools & Toys page (I believe) has GZIP ported for Unix Systems
Services. I got this to work for me:

cat "//'dataset_name'" | gzip -c > archive_name.gz

Yes that is double quotes around the double slash entity, and single
quotes around the fully-qualified dataset name, I am sure there may be
better syntax(?) but it worked.

So, if your tape data is cataloged, and you have mount authority you might
be able to issue a command like that for your tapes, sit back, and wait.

Once you're in gzip you can decompress it on a PC I'm sure.

Oh - you might have to pipe it through iconv to get it into ASCII /
Unicode before zipping it.

Tim Hare
Senior Systems Programmer
Florida Department of Transportation
(850) 414-4209

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

McKown, John

2007-01-31 14:52:17 UTC

Permalink

-----Original Message-----
From: IBM Mainframe Discussion List
Sent: Wednesday, January 31, 2007 8:50 AM
Subject: Re: TRSMAIN question
I believe TRSMAIN uses an LZ (Lempel-Ziv?) or LZW (add
Welch) algorithm
of sorts, but of course the algorithm matters less than the
archive format
in your case.
IBM's Unix Tools & Toys page (I believe) has GZIP ported for
Unix Systems
cat "//'dataset_name'" | gzip -c > archive_name.gz
Yes that is double quotes around the double slash entity, and single
quotes around the fully-qualified dataset name, I am sure
there may be
better syntax(?) but it worked.
So, if your tape data is cataloged, and you have mount
authority you might
be able to issue a command like that for your tapes, sit
back, and wait.
Once you're in gzip you can decompress it on a PC I'm sure.
Oh - you might have to pipe it through iconv to get it into ASCII /
Unicode before zipping it.
Tim Hare

Thanks for the idea. It may be easier than trying to reverse engineer
TRSMAIN. Assuming that I had the talent to do so and it is not forbidden
by IBM.

--
John McKown
Senior Systems Programmer
HealthMarkets
Keeping the Promise of Affordable Coverage
Administrative Services Group
Information Technology

The information contained in this e-mail message may be privileged
and/or confidential. It is for intended addressee(s) only. If you are
not the intended recipient, you are hereby notified that any disclosure,
reproduction, distribution or other use of this communication is
strictly prohibited and could, in certain circumstances, be a criminal
offense. If you have received this e-mail in error, please notify the
sender by reply and delete this message without copying or disclosing
it.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Tony Harminc

2007-02-03 02:26:30 UTC

Permalink

Post by McKown, John
Is there any documentation on the compression algorithm used by TRSMAIN?

The terse algorithm is explained in IBM's US patent 4814746 from 1989,
easily viewable at http://www.google.com/patents?vid=USPAT4814746 . The
patent contains a PL/I program that is claimed to implement the "invention",
though I doubt that it would directly interoperate with implementations like
TRSMAIN. And since, of course, this is a patent, presumably in the US and
perhaps other countries where software patents are granted, there would be
constraints if you did write something using the information therein. It
isn't clear to me, however, that the patent has anything to say about
decompression, so perhaps implementing that would be OK. But I'm not a
patent attorney, etc. etc. so please don't bet your company on this...

There are also claims out there on the net that the terse algorithm is
really "the same as" LZW, which is also patented, and which I understand is
the algorithm that got the GIF graphics format into much trouble some years
ago.

There are implementations of terse for all sorts of platforms, but IBM
doesn't seem to distribute them for much beyond MVS and VM. If you do some
severe Googling, you may find, of all things, an OS/2 16-bit version (that
will run under Windows), and perhaps even a 32-bit Windows version.

There appear to be several flavours of terse, which do or don't handle
things like ASCII-EBCDIC and codepage issues, mainframe-only things like
PDSs, and various other options.

Post by McKown, John
And is there anyway to ftp that to an ASCII based server and uncompress
it? Yes - this relates to my previous question about RACF IRRADU00
reformatted records.

If you can find an implementation for your platform of choice that
understands the mainframe version's options, then sure. But then you still
have to process those mainframe records on that platform, which may well be
the harder part.

Tony H.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html