This code is developed by Veeramani in C, not parallel at all and requires no Petsc.
Here we use it as a test code for different compilers. The test case used is a single particle sedimentation in a box, with 151,959 second order nodes and 100 time steps.
As a reference, on a 2.4GHz Intel(R) Core(TM)2 CPU workstation running Linux (kalmar.eche.ualberta.ca), it takes 645.52 seconds (check the last line of the profile.txt output file) to finish.
Then on the Cert Blue Gene/L, different compilers are used to build spAF_serial and their results are compared here. First of all, the outputs have been confirmed to produce identical results. The files are located at
/gpfs/bglscratch/pi/sjin/test/spAF_Serial/CateSingleParticleSedimentation6TetsPerBox
machine | compiler and options | time (seconds) |
---|---|---|
kalmar | gcc | 645.52 |
Blue Gene | mpicc (gcc) | 1841.91 |
Blue Gene | mpixlc | 1624.02 |
Blue Gene | blrts_xlc -qtune=440 -qarch=440d -O3 | 1622.78 |
xlc Compiler options explained
-qtune=440: Optimizes object code for the 440 family of processors.
auto
Generates object code optimized for the hardware platform on
which the program is compiled.
auto
Generates object code optimized for the hardware platform on
which the program is compiled.
-qarch=440d:
For any given -qarch setting, the compiler defaults to a
specific, matching -qtune setting, which can provide additional
performance improvements. The suboptions are:
auto
Generates instructions that will run on the hardware
platform on which the program is compiled.
440
Generates code for a single floating-point unit (FPU) only.
440d
Generates parallel instructions for the 440d Double Hummer
dual FPU. Note that if you encounter problems with code
generation, try resetting this option to -qarch=440.
For Blue Gene/L, we do have two FPUs so use 440d.
Note that to get the IBM compiler working (mpixlc and blrts_xlc), the following line has to be removed from the spAF_incl.h file:
#include <stdlib.h>
Obviously this line does not cause any problem for GCC. The reason of trouble in XLC is not clear. The error message with this line is
"/opt/ibmcmp/vac/bg/8.0/include/stdlib.h", line 21.8: 1506-046 (S) Syntax error.
"/opt/ibmcmp/vac/bg/8.0/include/stdlib.h", line 22.3: 1506-119 (E) Duplicate storage class specifier extern ignored.
"/opt/ibmcmp/vac/bg/8.0/include/stdlib.h", line 33.1: 1506-046 (S) Syntax error.
make: *** [spAF_main.o] Error 1
Conclusions
- GCC is about 10% slower the IBM XLC for IBM Blue Gene, which makes sense.
- The optimization options for XLC does not matter much.
- Blue Gene/L is NOT a good place to run sequential jobs such as spAF.