In a message dated 7/14/2005 1:25:33 A.M. Central Daylight Time,
However, even though it is not of much value, it is certainly of
interest.
If you really want to know how to speed instructions up, you must be
prepared
to read lots of highly arcane technical papers on instruction processing
units, pipelines, instruction caches, translation lookaside buffers, data
caches, bus width, look-ahead instruction preprocessing, multiple
processor
serialization effects, instruction predecessor relationships, et alia.
...
Or you could use a little assembler program, using STCK or TIMEUSED, and
execute contemplated code several hundred to several thousand times
each, and compare the results. No reading of papers, no head scratching,
just numbers for your environment.....
Right. If I wanted to know how long instruction op code XYZ takes to
execute, I would certainly do it the way you suggested. Reading of papers and head
scratching would be interesting to me since I am interested in learning how
instruction processing takes place on a low level - in general. But for any
one particular op code I would perform the experiment you described. I also
once put a STCK immediately in front of and immediately behind an instruction
that I wanted to learn about - Store SCHIB - and found it took something
like 60 microseconds, which was a huge amount of time compared to all other
instructions. After I saw that, I removed the Store SCHIB since it wasn't
necessary. The Princ. of Ops even warns about using this instruction a lot - can
cause performance problems - must be doing some serialization in the channel
subsystem. To be really, really accurate, you must also first find out how
much overhead you are imposing on your experiment by using STCK and any looping
instructions, so you have to test each of them several thousand times and
get averages.
One interesting result was that one MVCL for 1K takes about as long as
four MVCs of 256; below that MVCs are faster on every processor I
tested. Another surprise (?) was that two STs were faster than an STM
for two registers.
These are surprising and interesting results. But I would still not be
motivated to perform a timing experiment unless the code I was thinking about
optimizing was going to be executed a very large number of times per second in
some critical path or perhaps in a tight loop. If I were building a compiler,
however, I would be concerned about trying to optimize code execution as
much as possible in a generalized way, which means you would not know what
machine the code was to be run on with individual machine peculiarities to
consider. But I don't build compilers.
Bill Fairchild
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to ***@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html