This tutorial shows step by step procedures to run the FCPU code on the AICT cluster at University of Alberta (cluster.srv.ualberta.ca) based on existing petsc-dev built from sjin1 (account of Shi Jin).
Check Read Access
login to cluster.srv.ualberta.ca and make sure you have the read access to the files of sjin1 running the following command:
ls -l /home/sjin1/software/src
The output should include
total 909840
-rw-r--r-- 1 sjin1 uofa 183140 May 13 1999 bench.tar.gz
drwxr-xr-x 9 sjin1 uofa 4096 Feb 2 2007 bm
-rw-r--r-- 1 sjin1 uofa 35168366 Jun 22 2007 paraview-3.0.2-Linux-x86.tar.gz
drwxr-xr-x 11 sjin1 uofa 4096 Feb 2 2007 petsc-2.3.2-p8
drwxr-xr-x 13 sjin1 uofa 4096 Oct 27 14:41 petsc-dev-g++
-rw-r--r-- 1 sjin1 uofa 799969280 Jun 22 11:11 petsc-dev-kalmar.tar
drwxr-xr-x 13 sjin1 uofa 4096 Mar 2 2008 petsc-dev-old
drwxr-xr-x 14 sjin1 uofa 4096 Dec 4 20:27 petsc-dev-opt-g++
drwxr-xr-x 15 sjin1 uofa 4096 Jun 22 11:36 petsc-dev-sieve
-rw-r--r-- 1 sjin1 uofa 81582080 Jun 20 23:36 petsc-dev.tar
-rw-r--r-- 1 sjin1 uofa 13798421 Oct 2 19:23 petsc-dev.tar.gz
drwxr-xr-x 2 sjin1 uofa 4096 Feb 4 2007 stream
If the line for
petsc-dev-opt-g++ is not there, then there is no need to continue this tutorial since you don't have read access to my directory and therefore you should build your own petsc-dev.
Set enviroment variable
Edit your ~/.bashrc file to have the following lines:
module load mpi/openmpi-1.2.5
export PETSC_ARCH=linux-gnu-cxx-opt
export PETSC_DIR=/home/sjin1/software/src/petsc-dev-opt-g++
Once set, log out and login again or run ". ~/.bashrc" to source it.
Note that your default shell on AICT cluster may be csh while the above procedure works for bash. Here is the quote from AICT support
At the moment, permanently changing
your login shell on the cluster is
not possible.
As a workaround, edit your .login
file to contain only one line
exec bash -i -l
This is what I do. It seems to work ok.
When submitting jobs to the cluster,
be sure to include the following
directive in your scripts
#PBS -S /bin/bash
Ed[mund [Sumbar]]
Research Support, Univ of Alberta
Grab the code
To retrieve the latest fcpu source code using subversion, run the following
svn checkout https://svn.xp-dev.com/svn/jinzishuai_fcpu/ new-fcpu-tutorial
The new-fcpu-tutorial is the name of the new directory to be created where the codes are.
Refer to the faq if you need more help on using subversion to get the code.
If you already have the subversion directories checked out, then you should do
svn update
to make sure the code is update-to-date.
Build the code
Run the following commands
cd new-fcpu-tutorial/trunk/
make
And you will see lots of compilation messages. Once it is done, you should have a file called fcpu under the current directory.
Generate a simple mesh
Build the box mesh generator
Type
make Tet1stOrderBox
in the source directory and you should get a exectuable Tet1stOrderBox generated.
Create input files
Use your favorite text editor (VI, Emacs, etc) to create three files: xs.dis, ys.dis and zs.dis. In this example, their contents are identical and can be be seen as
sjin1@opteron-cluster tutorial $ cat xs.dis
0
0.2
0.4
0.6
0.8
1
sjin1@opteron-cluster tutorial $
The above is a simple example. You can use a simple Matlab script to generate these kind of meshes automatically.
Its code looks like this
clear;
dx=0.4;
x=[-50:dx:50]';
y=[-1.2:dx:1.2]';
z=[0:dx:20]';
save -ascii xs.dis x;
save -ascii ys.dis y
save -ascii zs.dis z
Generate mesh
Now run the Tet1stOrderBox code in the same directory of the *.dis files. For example
sjin1@opteron-cluster tutorial $ ../Tet1stOrderBox
m = 125 cubes, nm = 216, n = 216 1st-nodes, so there are 750 elements
sucessfully allocated 1736 Bytes for map
node = 216
elem = 750
sjin1@opteron-cluster tutorial $
Here
Tet1stOrderBox is located in the parent directory. You may want to use its absolute path.
Now there are a number of files named with 3d.* are generated in the current directory. To see them
sjin1@opteron-cluster tutorial $ ls
3d.aux 3d.bc 3d.lcon 3d.mybc 3d.nodes genDis.m xs.dis ys.dis zs.dis
Create the particle input file
A sample particle input file is given below
sjin1@opteron-cluster tutorial $ cat 3d.particles
#nSp=1
#rho_f=1, t=0
#1-id 2-radius 3-rho_p 4-fixed 5-x 6-y 7-z 8-u 9-v 10-z 11-w1 12-w2 13-w3 14-theta 15-phi
0 0.3 1.2 0 0 0 0.5 0 0 0 0 0 0 0 0
sjin1@opteron-cluster tutorial $
This file is quite self explantory. Now the first three lines are required.
Test run
To see if the fcpu code works, simply run it without any argument like
../fcpu
Note again you might want to use the absolute path of the fcpu file.
My output looks like
sjin1@opteron-cluster tutorial $ ../fcpu
Loading a box of xa=0,xb=1,ya=0,yb=1,za=0,zb=1; dx=0.2
[0]:before mesh loading from files:mem= 9.20 MB
[0]:after mesh loaded from files:mem=14.58 MB
0.523063 seconds spent in loading mesh from 3d.*
[0]:not-distributed mesh:mem=14.59 MB
############## Brief Local Mesh Report: ###############
rank 0: Mesh has 750 elements, 1650 faces, 1115 edges, 216 vertexes
[0]: Number of 1st order nodes: 216
[0]: Number of 2nd order nodes: 1331
[0]: localH=1,Zmin=0,Zmax=1
############## Brief Global Mesh Report: ###############
As a whole mesh: 750 elements, 1650 faces, 1115 edges and 216 vertexes
Number of 1st order nodes: 216
Number of 2nd order nodes: 1331
[0]:before setupDiscretization():mem=14.64 MB
########## setup BC ############
########### BC::applyBoundaryConditions()############
Velocity bc markers: 1 2 3 4 5 6
Pressure bc markers: 7
All bc markers: 1 2 3 4 5 6 7
CellMarker=8
[0]:before setupField:mem=14.78 MB
rank 0: ############# USG::setupFiled(): ##########
bc pmarker:2 points
bc vmarker:588 points
[0]:setup velocity fibration:mem=14.84 MB
[0]:setup velocity fibration:mem=14.84 MB
[0]:setup velocity fibration:mem=14.84 MB
[0]:setup pressure fibration:mem=14.84 MB
[0]:after setupField & setupUVWP. before getGlobalOrders:mem=14.84 MB
[0]:after getGlobalOrders-p:mem=14.96 MB
[0]:after getGlobalOrders-x:mem=14.96 MB
[0]:after getGlobalOrders-y:mem=14.96 MB
[0]:after getGlobalOrders-z:mem=14.96 MB
[0]:after getGlobalOrders:mem=14.96 MB
[0]:before caching():mem=14.96 MB
[0]: cached Cell Indices cost 0.04 MB
Time spent in pre-computing and caching indices: 0.0123799 sec
[0]: cached Cell Connectivity cost 0.03 MB
[0]:after caching():mem=14.97 MB
0.674958 seconds spent in USG::USG()
[0]:after USG created:mem=14.98 MB
SaveFieldsToFile(initBCzeroField) cost time: 0.04795
SaveFieldsToFile(initBCzeroField) cost time: 0.0559249
Save2ndOrderFieldsToGmshFile(curvedGmsh) cost time: 0.0801589
Save2ndOrderFieldsToGmshFile(initBCzeroField) cost time: 0.00761294
Save2ndOrderFieldsToGmshFile(initBCzeroField) cost time: 0.0137892
total second order nodes (vertexes+edges): 1331
[0]:before matrices created:mem=15.13 MB
[0]:start of CFD::CFD():mem=15.14 MB
[0]:before pressure Vecs created:mem=15.20 MB
[0]:pressure Vecs created:mem=15.21 MB
[0]:CFD::CFD(): Vecs created, before Mats:mem=15.21 MB
[0]:before allocating preS:mem=15.21 MB
[0]:before allocation:mem=15.22 MB
[0]:after allocation:mem=15.22 MB
[0]:after delete o/dEntries[i]:mem=15.22 MB
rank 0: allocate 2437 entries for Matrix=> 0.030108 MB memory
[0]:allocate matrix:mem=15.30 MB
PetscAllocateMatrix spent time: 0.00153804 secs
[0]:after cleanup:mem=15.30 MB
The pressure vector/matrix has global size=215, local(rank 0):215x215. time cost=0.00165296
PetscFillStiffnessMatrix() for Pressure takes time 0.0276151 secs
Creating Velocity Matrices...
[0]:before allocation:mem=15.36 MB
[0]:after allocation:mem=15.36 MB
[0]:after delete o/dEntries[i]:mem=15.36 MB
rank 0: allocate 14887 entries for Matrix=> 0.181564 MB memory
[0]:allocate matrix:mem=15.36 MB
PetscAllocateMatrix spent time: 0.00539088 secs
[0]:after cleanup:mem=15.36 MB
S[0] created, time=0.00545001
S[0] filled, time=0.103224
[0]:before M=S:mem=15.38 MB
M[0] created & filled, time=0.02038
The velocity-0 vector/matrix has global size=729, local (rank 0): 729 x 729
[0]:before allocation:mem=15.41 MB
[0]:after allocation:mem=15.41 MB
[0]:after delete o/dEntries[i]:mem=15.41 MB
rank 0: allocate 5679 entries for Matrix=> 0.069012 MB memory
[0]:allocate matrix:mem=15.41 MB
PetscAllocateMatrix spent time: 0.002599 secs
[0]:after cleanup:mem=15.41 MB
[0]:before allocation:mem=15.41 MB
[0]:after allocation:mem=15.41 MB
[0]:after delete o/dEntries[i]:mem=15.41 MB
rank 0: allocate 5679 entries for Matrix=> 0.071068 MB memory
[0]:allocate matrix:mem=15.41 MB
PetscAllocateMatrix spent time: 0.00240493 secs
[0]:after cleanup:mem=15.41 MB
[0]:before allocation:mem=15.42 MB
[0]:after allocation:mem=15.42 MB
[0]:after delete o/dEntries[i]:mem=15.42 MB
rank 0: allocate 5679 entries for Matrix=> 0.069012 MB memory
[0]:allocate matrix:mem=15.42 MB
PetscAllocateMatrix spent time: 0.00255799 secs
[0]:after cleanup:mem=15.42 MB
[0]:before allocation:mem=15.42 MB
[0]:after allocation:mem=15.42 MB
[0]:after delete o/dEntries[i]:mem=15.42 MB
rank 0: allocate 5679 entries for Matrix=> 0.071068 MB memory
[0]:allocate matrix:mem=15.42 MB
PetscAllocateMatrix spent time: 0.00235105 secs
[0]:after cleanup:mem=15.42 MB
[0]:before allocation:mem=15.42 MB
[0]:after allocation:mem=15.42 MB
[0]:after delete o/dEntries[i]:mem=15.42 MB
rank 0: allocate 5679 entries for Matrix=> 0.069012 MB memory
[0]:allocate matrix:mem=15.42 MB
PetscAllocateMatrix spent time: 0.00252509 secs
[0]:after cleanup:mem=15.42 MB
[0]:before allocation:mem=15.42 MB
[0]:after allocation:mem=15.42 MB
[0]:after delete o/dEntries[i]:mem=15.42 MB
rank 0: allocate 5679 entries for Matrix=> 0.071068 MB memory
[0]:allocate matrix:mem=15.42 MB
PetscAllocateMatrix spent time: 0.00233912 secs
[0]:after cleanup:mem=15.42 MB
PetscFillPVMatrices() takes time 0.101421 secs
All Matrices/Vectors Created.
[0]:after Mat, before Sol:mem=15.42 MB
KSP Solvers Created.
Pressure integration used 24 quadrature points.
Velocity integration used 24 quadrature points.
Done bith CFD() constructor!
[0]:after Sol, done with CFD():mem=15.49 MB
Done bith NSSolver() constructor!
[0]:Start with ParticulateFlow::ParticulateFlow():mem=15.49 MB
Fictious Domain Particle integration used 24 quadrature points.
Sub-grid lubrication force is applied at minDist=0.1.
update order =3
[0]:done with ParticulateFlow::ParticulateFlow():mem=15.49 MB
[0]:after matrices created:mem=15.49 MB
Startup Time: 1.2067 seconds.
collisionBufferLength=0.7
SaveFieldsToFile(init) cost time: 0.076571
SaveFieldsToFile(init) cost time: 0.0182772
Save2ndOrderFieldsToGmshFile(init) cost time: 0.00788379
Save2ndOrderFieldsToGmshFile(init) cost time: 0.0325999
[0]:after initializeField():mem=15.53 MB
####### Re=1, Fr=1, NSteps=0 #####
####### Start with AB-1 update, time=0, dt=0.1 #####
[0]:before 1st update:mem=15.54 MB
[0]: minD between particles is 100
0: deal with Collision at the beginning of 1st order update.
advaceParticle takes 3.40939e-05 seconds
[0] particle 0: 9 elements inside, 166 possibly intersecting
scanElementsForIntersection() takes 0.000429153 seconds
[0]: id 0, ParticulateFlow:integrated volume=0.112598,VInside=0.012
mutal+setup+scan+VolInt takes 0.000662088 seconds
computeVelGrad() cost 0.0134761 seconds, inside loop=0.00793719
time for Jacobian=0.00133085, for Indices=0.00128317, for Velocity=0.00147271 seconds
computeConvectionTermFromSection() takes 0.0136962 seconds
1st order: Number of PETSc iterations for S[0,1,2] = 0,0,10
order1UpdateDiffusionWithParticleGravity() takes 0.0363901 seconds
1st order: Number of PETSc iterations for preS = 13
1st order: Number of PETSc iterations for M[0,1,2] = 1,1,1
order1UpdateProjectionDiffusion() takes 0.00177097 seconds
AB3UpdateParticlesVelocitiesLocal() cost 0.00152993 seconds
computeParticlesOmegas-1 takes 0.00167704 seconds
Fictious Domain Correction, Number of PETSc iterations for M[0,1,2] = 1,1,1
updateFluidVelocityInParticlesWeak() takes 0.00127196 seconds
computeParticlesOmegas-2 takes 0.0016911 seconds
FD takes 0.00482917 seconds
[0]: minD between particles is 100
0: deal with Collision at the end of 1st order update.
particlePositionCorrection takes 2.09808e-05 seconds
output to particle files takes 9.98974e-05 seconds
[0]:after 1st update:mem=15.74 MB
SaveFieldsToFile(firstUpdate) cost time: 0.081192
SaveFieldsToFile(firstUpdate) cost time: 0.0199361
Save2ndOrderFieldsToGmshFile(firstUpdate) cost time: 0.00801992
Save2ndOrderFieldsToGmshFile(firstUpdate) cost time: 0.0139341
walltime cost for startup AB-1 step 0.08535
##### AB-2 update time=0.1 ######
[0]: minD between particles is 100
0: deal with Collision at the beginning.
advaceParticle takes 0.000127792 seconds
[0] particle 0: 9 elements inside, 166 possibly intersecting
scanElementsForIntersection() takes 0.000463009 seconds
[0]: id 0, ParticulateFlow:integrated volume=0.112663,VInside=0.012
mutal+setup+scan+VolInt takes 0.000745058 seconds
computeVelGrad() cost 0.013494 seconds, inside loop=0.00800395
time for Jacobian=0.0013268, for Indices=0.00125837, for Velocity=0.00141168 seconds
computeConvectionTermFromSection() takes 0.013705 seconds
1st order: Number of PETSc iterations for S[0,1,2] = 10,10,10
order1UpdateDiffusionWithParticleGravity() takes 0.0513711 seconds
1st order: Number of PETSc iterations for preS = 13
1st order: Number of PETSc iterations for M[0,1,2] = 1,1,1
order1UpdateProjectionDiffusion() takes 0.00139713 seconds
AB3UpdateParticlesVelocitiesLocal() cost 0.0015738 seconds
computeParticlesOmegas-1 takes 0.00169897 seconds
Fictious Domain Correction, Number of PETSc iterations for M[0,1,2] = 1,1,1
updateFluidVelocityInParticlesWeak() takes 0.00121593 seconds
computeParticlesOmegas-2 takes 0.00169706 seconds
FD takes 0.00478792 seconds
[0]: minD between particles is 100
0: deal with Collision at the end.
particlePositionCorrection takes 5.98431e-05 seconds
output to particle files takes 5.31673e-05 seconds
[0]:after 2nd update:mem=15.80 MB
walltime cost for startup AB2 step 0.203506
SaveFieldsToFile(final) cost time: 0.0656512
SaveFieldsToFile(final) cost time: 0.0200369
Save2ndOrderFieldsToGmshFile(final) cost time: 0.00801206
Save2ndOrderFieldsToGmshFile(final) cost time: 0.0136909
saving CFD restart files into directory restart_0
saving ParticulateFlow restart files into directory restart_0
run() Time: 0.675462 seconds.
Total Time: 1.88216 seconds.
[0]:Finished:mem=15.81 MB
Cleanup in ~USG()
PetscFinalize()
Summary of Memory Usage in PETSc
[0]Run with -malloc to get statistics on PetscMalloc() calls
[0]Current process memory 1.58188e+07 max process memory 1.58188e+07
Bye!
If you similar outputs, congratulation! Your code is built successfully.
PBS script
To run a job through the PBS scheduler on the cluster, try the following script
sjin1@opteron-cluster tutorial $ cat submit.pbs
#!/bin/bash -l
#PBS -N test
#PBS -S /bin/bash
#PBS -l nodes=1:ppn=1
#PBS -l walltime=00:010:00
#PBS -l pvmem=2000mb
#PBS -M shi.jin@ualberta.ca
cd $PBS_O_WORKDIR
cp $PBS_NODEFILE .
echo "Starting run at: `date`"
CODE=/home/sjin1/research/new-fcpu-tutorial/trunk/fcpu
mpiexec -np 1 $CODE -Re 1.0 -Fr 0.0098 -dt 0.05 -NSteps 10 \
&>out1
echo "Job finished at: `date`"
Once it is run, there will be many outputs in the out1 file. Also many files are genrated.
For example,
- .pvtu and .vtu files are pressure and velocity fields at a given step. They can be opened by paraview.
- .msh are GMSH files describing the mesh alone.
- .pos are GMSH field files to be used together with the .msh file.
- particles/ directory stores all particle related files. There are two kinds of particle files
- particle?.dat: describes the trajectory of a given particle
- parT????.dat: has all particle information at a given time snap shot.
- restart_*/ directory stores files needed for a restart.
*
Parallel jobs
As shown in the pbs script, fcpu can be invoked by
mpiexec -n 2 fcpu
Note that for this example to work, we need to change the zs.dis file to only have 0.8 length.
Strictly speaking, the example give above is not a very good example at all since the grid size is comparable to the sphere size and in parallel we require each zone to be larger than the sphere. For a better parallel example, try a better resolved problem, such as the single particle sedimentation experiment done by ten Cate.
However, there are issues with problem size due to memory limitation. Refer to tutorial on parallel simulation for details.