Serial spAF on Blue Gene/L

This code is developed by Veeramani in C, not parallel at all and requires no Petsc.

Here we use it as a test code for different compilers. The test case used is a single particle sedimentation in a box, with 151,959 second order nodes and 100 time steps.
As a reference, on a 2.4GHz Intel(R) Core(TM)2 CPU workstation running Linux (kalmar.eche.ualberta.ca), it takes 645.52 seconds (check the last line of the profile.txt output file) to finish.

Then on the Cert Blue Gene/L, different compilers are used to build spAF_serial and their results are compared here. First of all, the outputs have been confirmed to produce identical results. The files are located at

/gpfs/bglscratch/pi/sjin/test/spAF_Serial/CateSingleParticleSedimentation6TetsPerBox

machine compiler and options time (seconds)
kalmar gcc 645.52
Blue Gene mpicc (gcc) 1841.91
Blue Gene mpixlc 1624.02
Blue Gene blrts_xlc -qtune=440 -qarch=440d -O3 1622.78

xlc Compiler options explained

-qtune=440: Optimizes object code for the 440 family of processors.
    auto
                   Generates object code optimized for the hardware platform on
                   which the program is compiled.
    auto
                   Generates object code optimized for the hardware platform on
                   which the program is compiled.
-qarch=440d: 
   For any given -qarch setting, the compiler defaults to a
              specific, matching -qtune setting, which can provide additional
              performance improvements.  The suboptions are:

              auto
                   Generates instructions that will run on the hardware
                   platform on which the program is compiled.
              440
                   Generates code for a single floating-point unit (FPU) only.
              440d
                   Generates parallel instructions for the 440d Double Hummer
                   dual FPU. Note that if you encounter problems with code
                   generation, try resetting this option to -qarch=440.
               For Blue Gene/L, we do have two FPUs so use 440d.

Note that to get the IBM compiler working (mpixlc and blrts_xlc), the following line has to be removed from the spAF_incl.h file:

#include <stdlib.h>

Obviously this line does not cause any problem for GCC. The reason of trouble in XLC is not clear. The error message with this line is

"/opt/ibmcmp/vac/bg/8.0/include/stdlib.h", line 21.8: 1506-046 (S) Syntax error.
"/opt/ibmcmp/vac/bg/8.0/include/stdlib.h", line 22.3: 1506-119 (E) Duplicate storage class specifier extern ignored.
"/opt/ibmcmp/vac/bg/8.0/include/stdlib.h", line 33.1: 1506-046 (S) Syntax error.
make: *** [spAF_main.o] Error 1

Conclusions

  • GCC is about 10% slower the IBM XLC for IBM Blue Gene, which makes sense.
  • The optimization options for XLC does not matter much.
  • Blue Gene/L is NOT a good place to run sequential jobs such as spAF.
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License