Penn State EMS Environment Institute
Environmental Computing Facility
Mathematical Acceleration SubSystem (MASS)

MASS Version 2.7

Contents

  o Introduction
  o Performance and Accuracy Information for the MASS Libraries
  o Using the Libraries

For questions, contact mass@austin.ibm.com or write to:

         MASS Program Manager (Mail Stop 9444)
         IBM Austin Laboratory
         11400 Burnet Road
         Austin, TX 78758-3406

Introduction

MASS consists of libraries of tuned mathematical intrinsic functions. Each
new MASS release contains the material from previous releases that has not
been changed so that only the latest release need be kept for currency.

Version 2.6
  Version 2.6 makes the scalar library, libmass.a, safe for multi-threaded
  applications where MASS functions might be called by two threads
  simultaneously (threadsafe). The scalar library shows only minor changes in
  performance.
Version 2.7
  Version 2.7 replaces twelve functions (forms of exp, log, sin, and cos) in
  the general vector library, libmassv.a, with versions that have been
  rewritten for better short-precision performance and better performance for
  the POWER3 architecture as implemented on the 630 processor. The
  architecture-specific vector libraries, libmassvp2.a and libmassvp3.a, also
  include the new functions.

Scalar Library

The MASS Scalar Library libmass.a contains an accelerated set of frequently
used math intrinsic functions in the AIX system library libm.a (now called
libxlf90.a in the IBM XL Fortran manual):

  o sqrt, rsqrt, exp, log, sin, cos, tan, atan, atan2, sinh, cosh, tanh,
    dnint, x**y

The libmass.a library can be used with either Fortran or C applications and
will run under AIX on all of the IBM RS/6000 processors. Because MASS does
not check its environment, it must be called with the IEEE rounding mode set
to nearest and with exceptions masked off (the default XLF environment). MASS
may not work properly with other settings. In some cases MASS is not as
accurate as the system library, and it may handle edge cases differently from
libm.a (sqrt(inf), for example). The trig functions (sin, cos, tan) return
NaN (Not-a-Number) for large arguments (abs(x)>2**50*pi). In release 2.4 the
x**y function was revised to accept negative x arguments with integer y
arguments in accordance with the C standard. (The Fortran standard prohibits
such arguments and the function previously returned NaNs for them.)

Vector Libraries

The general vector library, libmassv.a, contains vector function subroutines
that will run on the entire IBM RS/6000 family. The second library,
libmassvp2.a, contains the subroutines of libmassv.a and adds a set that is
tuned for and based upon the POWER2 architecture. The POWER2 processors
include the 590 and follow-on servers and related desktop systems and the
POWERParallel SP2 using the P2SC processor. The third library libmassvp3.a
contains the routines of libmassv.a, some which have been tuned for the
POWER3 architecture used by the 630 processor. See the performance tables in
the following sections to see the contents of these libraries. Please note
that in Version 2.3, the vdint and vdnint functions were rewritten so that
the dependency on POWER2 architecture was removed. Thus, they were added to
libmassv.a and remain in libmassvp2.a by the inclusion of libmassv.a in
libmassvp2.a. Users have requested a vector atan2 function and
short-precision vector functions and these were added in version 2.3.

The vector libraries libmassv.a, libmassvp2.a and libmassvp3.a can be used
with either FORTRAN or C applications. However, when calling the library
functions from C, only call by reference is supported. The vector functions
were developed under similar assumptions to those for libmass.a: IEEE
round-to-nearest mode and exceptions masked off. Accuracy is comparable to
that of libmass.a scalar counterparts, though results may not be bit-wise
identical.

The MASS Vector Fortran/C Source Library enables application developers to
write portable vector codes. For this purpose one Fortran source library has
been provided: libmassv.f. It corresponds to libmassvp2.a, and thus contains
all of the vector functions of libmassv.a as well. An examination of the
following performance tables shows that there can be a performance
improvement even when the MASS scalar functions are used in the vector loops
of libmassv.f.

Performance and Accuracy Information for the MASS Libraries

Sample scalar library performance data is provided for the 604E (PowerPC),
630 (POWER3), and P2SC (POWER2) processors. The data should be considered
approximate. It was obtained by timing many repetitions of a loop over 1,000
random arguments and includes all overhead. Timing in this way will bring the
input and output vectors into the on-chip cache (the loop is short enough for
them to fit in cache). Performance may deteriorate seriously when the input
and output vectors are not in cache. Performance may also deteriorate for
arguments at or near the end-points of the valid argument ranges. The P2SC
and 630 processors have a hardware sqrt which is not timed here. The
libxlf90.a measurements were made with the versions of the library available
on the respective test systems. They may vary from the versions timed for
previous MASS releases. The user may experience performance which varies from
that found in this table.



  Math function performance (cycles per call, length 1000 loop)


                  libxlf90.a       libmass.a         ratio
Function Range  604E  630 P2SC  604E  630 P2SC   604E  630 P2SC
  sqrt     A      68   59*  41*   50   45*  27*  1.36 1.31 1.52
 rsqrt     A      80   71*  56*   52   53*  28*  1.54 1.34 2.00
   exp     D      87   64   53    42   27   22   2.07 2.37 2.41
   log     C     100   87   67    55   53   33   1.82 1.64 2.03
   sin     B      51   36   34    27   15   19   1.89 2.40 1.79
   sin     D      79   60   49    60   42   37   1.32 1.43 1.32
   cos     B      52   37   35    26   15   19   2.00 2.47 1.84
   cos     D      76   58   50    59   42   36   1.29 1.38 1.39
   tan     D     137  113   90    76   42   36   1.80 2.69 2.50
  atan     B      60   52   40    53   45   36   1.13 1.16 1.11
  atan     D      97   70   61    86   58   57   1.13 1.21 1.07
  sinh     D     218  186  178    61   44   31   3.57 4.23 5.74
  cosh     D     154  120  129    49   34   26   3.14 3.53 4.96
  tanh     D     217  206  185    78   53   43   2.78 3.89 4.30
 dnint     D      36   24   23    22   12   13   1.64 2.00 1.77
 atan2     B     538  410  557   106   88   71   5.08 4.66 7.85
  x**y     C     287  228  187   114   97   63   2.52 2.35 2.97

 *  When this function is compiled specifically for this processor, inline
code using the optional sqrt instruction will be generated.  This
is not what is being timed here.

   Range Key    Processor     Cycle time      Dcache
 A =    0,  1      604E     3.0 nanoseconds     32k
 B =   -1,  1       630     5.0 nanoseconds     64k
 C =    0,100      P2SC     7.4 nanoseconds    128k
 D = -100,100

The following table gives estimated processor clock cycles per vector element
evaluation for the various MASS vector libraries. These results are obtained
for vectors of length 1000 so that the caches contain all vectors. Results
using the functions in libxlf90.a are shown under the columns labeled libx.
They were computed using the code compiled from the MASS Fortran source code
library libmassv.f (see The Vector Libraries) using the IBM XLF compiler with
the -O option without linking to libmass.a. Results obtained by repeating the
previous process with linkage to libmass.a are shown under the columns
labeled mass. Results obtained using the libmassv.a, libmassvp2.a, or
libmassvp3 are shown in the columns labeled massv, -vp3, or massvp2,
respectively. Times are not given for functions in the libmassvp2 and
libmassvp3 libraries which have been carried over from the libmassv library.
As before, results were computed on PowerPC, POWER2 and POWER3 systems.

Users should expect results to vary with vector length. Items in the table
where the indicated library function does not exist or the measurement was
not done are left blank.



    Math Library Performance (cycles per evaluation, length 1000 loop)


                     604E                630               P2SC
  function range libm mass massv libm mass massv vp3  libm mass massv
     vrec    D     32*       10     9*        6    4     8*        4
    vsrec    D     18*        8     7*        5    3     8*        4
     vdiv    D     32*       12     9*        7    5     9*        5
    vsdiv    D     18*       10     7*        6    3     9*        4
    vsqrt    C     67   48   16    11*        9    6    13*        7
   vssqrt    C     70   48   10     7*        8    5    13*        5
   vrsqrt    C     79   49   16    22*        9    6    22*        7
  vsrsqrt    C     83   51    9    16*        7    4    22*        5
     vexp    D     83   45   16    64   33    6         53   21    7
    vsexp    E     85   44   13    68   36    5         58   21    6
     vlog    C     99   56   20    83   53    8         67   35    8
    vslog    C    102   56   17    86   57    7         66   37    7
     vsin    B     50   29   11    36   16    5         34   17    5
     vsin    D     79   59   27    60   43   12         50   37   12
    vssin    B     51   26    8    39   18    4         40   16    4
    vssin    D     79   58   20    62   46    9         56   38    9
     vcos    B     51   26    9    37   16    4         34   17    4
     vcos    D     75   59   27    58   43   12         51   36   11
    vscos    B     52   26    7    39   18    3         40   16    3
    vscos    D     76   59   20    61   46    9         56   37    9
  vsincos    B    100   53   19    80   33    8         80   38    8
  vsincos    D    151  116   29   123   92   12        111   81   12
 vssincos    B    107   55   15    79   38    6         78   36    7
 vssincos    D    159  118   24   125   98   10        110   80   10
 vcosisin    B    104   55   19    78   34    8         79   37    8
 vcosisin    D    156  118   29   123   93   12        111   81   12
vscosisin    B    108   55   15    79   36    6         78   36    6
vscosisin    D    160  119   23   125   95    9        110   79   10
     vtan    D    136   74   32   111   52   19         90   38   13
    vstan    D    136   74   32   113   56   19         95   39   12
   vatan2    D    545  104   40   413   87   25        555   73   17
  vsatan2    D    545  104   40   418   89   25        558   71   17
   vdnint    D     37   22    7    24   12  3.4         23   13  2.7
    vdint    D     36         6    22       2.8         21       2.6
                                                                massvp2
   vidint    D                                          4.0      2.7
    vasin    B                                           48       17
    vacos    B                                           49       17
  vdfloat    D                                          3.0      1.8
   vdsign    D                                            9      3.5

  * indicates inline instructions timed (not a subroutine call)

  Range Key     Processor     Cycle time         Dcache size
  A =    0,  1     604E    3.0 nanoseconds      32 kilobytes
  B =   -1,  1      630    5.0 nanoseconds      64 kilobytes
  C =    0,100     P2SC    7.4 nanoseconds     128 kilobytes
  D = -100,100
  E =  -10, 10

The performance of the POWER2-only vasin and vacos functions in libmassvp2.a
is argument dependent. The results were obtained for a large set of arguments
uniformly distributed between -1 and 1.

Short-precision versions of the vector functions vexp through vatan2 are now
included in libmassv.a. They are obtained when the prefix is vs rather than
just v.

The following table provides sample accuracy data for the libx, libmass,
libmassv, and libmassvp3 libraries. The numbers are based on the results for
10,000 random arguments chosen in the specified ranges. Real*16 functions
were used to compute the errors. There may be portions of the valid input
argument range for which accuracy is not as good as illustrated in the table.
Also, the user may experience accuracy which varies from the table when
argument values are used which are not represented in the table.

The Percent Correctly Rounded (PCR) column elements are obtained by counting
the number of correctly rounded results out of the 10,000 random argument
cases. A result is correctly rounded if the function returns the IEEE 64 bit
value which is closest to the infinite-precision exact result.



                        Math Library Accuracy


                       libm         libmass       libmassv       libmassvp3
function range     PCR     MaxE   PCR     MaxE   PCR     MaxE   PCR     MaxE
   rec     D      100.00*   .50*                100.00    .50  100.00    .50
  srec     D      100.00*   .50*                 92.47    .66   99.97    .50
   div     D      100.00*   .50*                 74.21   1.28   74.21   1.28
  sdiv     D      100.00*   .50*                100.00    .50   74.49   1.31
  sqrt     A      100.00    .50   96.59    .58   96.42    .60   63.14   2.16
 ssqrt     A      100.00    .50  100.00    .50   87.64    .79   87.05    .83
 rsqrt     A       88.52    .98   98.60    .54   97.32    .62   82.00   1.22
srsqrt     A      100.00    .50  100.00    .50   86.39    .82   89.66    .86
   exp     D       99.95    .50   96.55    .63   96.58    .63
  sexp     E      100.00    .50  100.00    .50   98.87    .52
   log     C       99.99    .50   99.69    .53   99.69    .53
  slog     C      100.00    .50  100.00    .50   99.91    .51
   sin     B       81.31    .91   96.88    .80   97.28    .72
   sin     D       86.03    .94   83.88   1.36   83.85   1.27
  ssin     B      100.00    .50  100.00    .50   99.95    .50
  ssin     D      100.00    .50  100.00    .50   99.73    .51
   cos     B       92.95   1.02   92.20   1.00   93.19    .88
   cos     D       86.86    .93   84.19   1.33   84.37   1.33
  scos     B      100.00    .50  100.00    .50   99.35    .51
  scos     D      100.00    .50  100.00    .50   99.82    .51
   tan     D       99.58    .53   64.51   2.35   50.48   3.19
  stan     D      100.00    .50  100.00    .50  100.00    .50
 atan2     D       74.66   1.59   88.02   1.69   84.01   1.67
satan2     D      100.00    .50  100.00    .50  100.00    .50
  atan     B       99.82    .51   92.58   1.78
  atan     D       99.98    .50   98.86   1.72
  sinh     D       94.78   1.47   89.54   1.45
  cosh     D       95.64    .97   92.73   1.04
  tanh     E       94.08   2.95   83.33   1.79
  X**Y     C       99.95    .50   96.87    .62

         * indicates hardware instruction was used

  Range Key        PCR  = Percentage correctly rounded
  A =    0,  1     MaxE = Maximum observed error in ulps
  B =   -1,  1
  C =    0,100
  D = -100,100
  E = - 10, 10


Using the MASS Libraries

The Scalar Library

To use libmass.a, use -lmass before libm.a or libxlf90.a in the linker command
line. We will use libm.a in the following examples. On the EMSEI SP, the mass
libraries are installed in /usr/local/lib and the include files are installed
in /usr/local/include, so the -L option will have to be added to the examples
given below. If the example uses the command line:

 xlf progf.f -o progf -lmass
 cc progc.c -o progf -lmass -lm

Then on our SP, the command lines for Fortran and C would be:

 xlf progf.f -o progf -L/usr/local/lib -lmass
 cc progc.c -o progf -L/usr/local/lib -lmass -lm

(Fortran links with libm.a automatically, so only -lmass need to be specified
on the command line. For C code, you must link both libmass.a and libm.a,
since libmass.a includes only a subset of the functions in libm.a.). The
library uses some global names for shared tables. These names have the form
%...$.

When called from C code, the functions in libmass.a will not set the global
variable errno to indicate range, domain, or loss of precision errors. For
example, with libm.a, sqrt(-1) will return the value NaN (not a number) and
also sets errno to 33 (EDOM -- domain error); with libmass.a, sqrt(-1) simply
returns NaN, but errno is not set.

The user should recall that the rsqrt function is handled in a different way
from the other intrinsic functions by the XLF compiler. This is discussed in
the XLF manuals, and it is suggested that the user review this material.

Selective Use of libmass.a

If you wish to use libmass.a for some functions and the normal libm.a for the
remainder, you can use an export list with the ld command. For example, to
select only the fast tangent routine from libmass.a for use with the C
program sample.c:

  1. Create an export list containing the names of the desired functions. In
     this case, the file export.list will contain only two lines:

     #!
     .tan

     (Remember that Fortran names start with "._", while C names start with
     ".")

  2. Pull the exported routines into an object file using the load command
     with libmass.a.

      ld -o fast_tan.o  -bE:export.list -lmass

      (or, if libmass.a is not in /usr/lib)

      ld -o fast_tan.o -bE:export.list -L/some/other/path -lmass

  3. Create the final executable using cc, specifying fast_tan.o before the
     standard math library, libm.a. This will link only the tan routine from
     MASS (now in fast_tan.o) and the remainder of the math subroutines from
     the standard system library:

       cc -o sample fast_tan.o -lm

(Note: this scheme will work for all routines in libmass.a except sin, cos,
atan, and atan2. These routines are coded together, so selecting fast sin
will also link in fast cos; selecting atan will also link atan2.)

The Vector Libraries

Successful use of the MASS vector libraries is contingent on the user making
the effort to vectorize his code. To assist in that effort, the Fortran
source library libmassv.f has been provided for use on non-IBM systems where
the MASS libraries are not available. The syntax for the vector functions is
visible in libmassv.f, and the user can write code using these functions to
obtain code that may port to systems other than the RS6000. The user can then
use the faster MASS vector libraries with that same code when running on an
RS6000. See the section on Vector Code Portability for more details.

To use the faster MASS libraries in a code that has been vectorized as
indicated, simply use the corresponding library name(s) in the linker command
line. For example, if the library is installed in the customary location in
directory /usr/lib, then the command lines for Fortran and C would be:

  xlf progf.f -o progf -lmassv
  cc progc.c -o progf -lmassv -lm

If libmassv.a is installed in a directory other than in /usr/lib, for example
in /home/somebody/mass, use the -L option to add that to the search path:

 xlf progf.f -o progf -L/home/somebody/mass -lmassv
 cc progc.c -o progf -L/home/somebody/mass -lmassv -lm


The vector function subroutines may be used as any Fortran function
subroutines via a CALL statement, using the same syntax as the functions in
libmassv.f. Except for vdiv, vsincos, vcosisin, vatan2, vdfloat, vidint, and
vdsign, the functions in libmassv.a and libmassvp2.a are all of the form
function_name (y,x,n), where x is the source vector, y is the target vector,
and n is the vector length. The arguments y and x are assumed to be
long-precision (real*8) for functions whose prefix is v, and short-precision
(real*4) for functions with prefix vs. The three-argument subroutines are all
used in the same way. For example:


  .....
  DIMENSION X(500),Y(500)
  .....
  CALL VEXP(Y,X,500)
  .....

returns a vector Y of length 500 whose elements are exp(X(I)); I=1,500.

The functions vdiv, vsincos, vatan2, and vdsign are of the form
function_name(x,y,z,n). Vdiv returns a vector x whose elements are y(I)/z(I),
I=1,n. Vsincos returns two vectors, x and y, whose elements are sin(z(I)) and
cos(z(I)) respectively. Vatan2 returns a vector x whose elements are
atan(y(I)/z(I)) respectively. Vdsign returns a vector x of elements of the
form dsign(y(I),z(I)). Arguments follow the same conventions as given
previously.

In vcosisin(y,x,n), x is a vector of n real*8 elements and the function
returns a vector y of n complex*16 elements of the form
(cos(x(I)),sin(x(I))).

In vdfloat(y,l,n) and vidint(l,x,n), x and y are vectors of n real*8 elements
and and l is a vector of n integer*4 elements. Vdfloat returns a vector of
elements of the of the form dfloat(l(I)). Vidint returns a vector of elements
of the form idint(x(I)).

When calling the vector functions from C, the user is reminded that only call
by reference is supported.

Vector Code Portability

The recommended procedure for writing a portable code that is vectorized for
using the fast MASS vector libraries is to write in ANSI standard language
and use the vector functions defined by libmassv.f. Then, to prepare to run
on a system other than an IBM RS/6000, compile the application source code
together with the libmassv.f source. The vector syntax to be used is visible
in the libmassv.f source. It may be necessary to comment out one line of the
vrsqrt subroutine, which is a directive to the IBM XLF compiler, for full
portability.

When running the application on an IBM RS/6000, the faster MASS vector
libraries can be linked as described previously.

WARNING: Do not use libmassv.f on IBM RS/6000 systems. Use the -lmassv flag
instead. The libmassv.f Fortran source vector library should be used as a
portable substitute for the MASS vector libraries only on non-IBM systems.


Fine Print

This document is an excerpt from the README included in the MASS distribution,
which falls under the same license as that distribution:

MASS Version 2.7 is licensed to you under the terms and conditions of
your AIX license with IBM, and the following additional provisions:
Notwithstanding anything to the contrary contained in your AIX
license, MASS is provided to you AS IS. IBM MAKES NO WARRANTIES,
EXPRESSED OR IMPLIED, INCLUDING THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IBM has no
obligation to defend or indemnify against any claim of infringement,
including, but not limited to, patents, copyright, trade secret,
or intellectual property rights of any kind.




Back to the index