Run Jobs On Blue Gene

There are several ways to run jobs on Blue Gene

  • Loadleveler
  • mpirun
  • mmcs

But for the system we are using, loadlever is not working properly. mpirun is the one we should use most of the time while mmcs is very difficult to use but it is much more powerful.

MPIRUN

A typical usage of mpirun is like

mpirun -partition <partition> -n <np> -cwd $PWD -exe <file to run>

On our system, existing partitions are

Partition Number of Nodes Comment
R00 1024 all nodes
R000 512 not working due to hardware issues
R001 512 the second half under the same high speed network
R001-N0-32 32 small partition within R001, there is a similar one under R000
R001-N1-32 32 Similar partitions exist from N0 to N7
R001-N8-128 128
R001-NC-128 128

Note the faulty node as of January 8, 2008 is a node belonging to partitions R00, R000 and R000-N2-32, making them unavailable.

Other useful mpirun options are

-mode <CO or VN> Execution mode: COprocessor or VirtualNode mode. Default is CO.

In Coprocessor Mode (the default mode), one of the two processors runs the user application,
and the other processor is dedicated to the Message Passing Interface (MPI). In Virtual Node
Mode, both processors run their own copy of the user application. In short CO=use one MPI process per node, VN=use two MPI processes per node

-nofree Do not deallocate the partition if MPIRUN had allocated it.
-verbose <0 - 4> Verbosity level of mpirun. Default is 0.
-trace <tracelevel> Create a detailed trace as mpirun executes. Output is to a a file in the current directory.

MMCS

Start MMCS_DB_CONSOLE

Follow this procedure as a normal user (don't need root)

ssh sn
cd ~bgdb2cli
cd sqllib/
. ./db2profile
cd /bgl/BlueLight/ppcfloor/bglsys/bin/ 
./mmcs_db_console

Note mmcs_db_console can only be started on the service node (sn) since there is a library missing on the front end nodes (fen1 and fen2).
To my it easy, I created the following script and placed it in my path so that I can use mmcs with one command
sjin@fen1 bin $ pwd
/nfshome/sjin/software/bin
sjin@fen1 bin $ cat mymmcs
#!/bin/bash
cd /dbhome/bgdb2cli/sqllib
. ./db2profile
cd /bgl/BlueLight/ppcfloor/bglsys/bin/
./mmcs_db_console

sjin@fen1 bin $

Useful Commands in MMCS

  • help: get a list of commands
  • list_blocks: get a list of blocks in use
  • list_jobs: get a list of running jobs
  • list bglblock : list all blocks(partitions)
  • list bglblock <partition> : list the information for a specific partition

Tthe _status information is especially useful. For example, I means Initialized and F means Free, etc.

More on Partitions

Partitions are very important for Blue Gene. It is used exclusively by jobs. Once a partition is in use, no other jobs can use it. So it is extremely important to use the smallest partition that can hold the job. For example, it is a very bad idea to use the R001 partition unless we are using more than 128 MPI processes.

If -nofree is used in mpirun, the partition is not released. To manually release the partition, do

ssh sn freepartition -partition <partition>

You can find out the status of particular partition using the MMCS command list_blocks and list bglblock, as described above in the MMCS section.

References

IBM Red Book For Administrator

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License