User Information

Allowed Use of Nodes

Writing Parallel Programs

Compiling and Linking Parallel Programs

Running Programs

Debugging Programs

Profiling Programs

More on Petsc installation

More on ScaLAPACK installation

The Mini Cluster

Help and Documentation

Return to Beowulf Home Page
 

Allowed Use of Nodes

1. master: compilation, editing. No computations.
2. node1 and node2: manually started jobs, including debugging
3. node3 -- node35: jobs submitted to PBS only. No manually started jobs.
 
 

Writing Parallel Programs

Programs written using any of the following software tools will be portable to current supercomputers.
 
 

Compiling and Linking Parallel Programs

ScaMPI (with GNU):

GNU C: gcc -D_REENTRANT -I/opt/scali/include -L/opt/scali/lib -lmpi
GNU C++: g++ -D_REENTRANT -I/opt/scali/include -L/opt/scali/lib -lmpi
GNU Fortran: g77 -D_REENTRANT -I/opt/scali/include -L/opt/scali/lib -lfmpi -lmpi

MPICH (with GNU):

GNU C: gcc -D_REENTRANT -I/opt/mpich/include -o a.out source.c -L/opt/mpich/lib -lmpich
GNU C++: g++ -D_REENTRANT -I/opt/mpich/include -o a.out source.c++ -L/opt/mpich/lib -lmpich
GNU Fortran: coming soon

ScaMPI (with PGI):
There are two versions of PGI on Beowulf. The PGI variable is set to /opt/pgi when the session is started. Use PGI v3.2 with the Totalview Debugger. To do this setenv PGI /opt/pgi-3.2 and make sure the selected version comes first in the path, for example set path=($PGI/linux86/bin $path).

PGI C: pgcc -D_REENTRANT -I/opt/scali/include -L/opt/scali/lib -lmpi
PGI C++:pgCC -D_REENTRANT -I/opt/scali/include -L/opt/scali/lib -lmpi
PGI HPF: pghpf -Mmpi -L/opt/scali/lib -lfmpi -lmpi
PGI Fortran with MPI calls: not tested


Recommended Compiler Optimization Flags:
Optimized code usually performs better then code that is not optimized. The following options may be added to the appropriate compile commands listed above. PGI provides a solid introduction to optimized code and gives guidelines for choosing optimization levels that are relevant to all compilers.
GNU (also mpicc/CC/77): -O , -O1 , -O2, -O3 , -O0, -Os
PGI: -fast, -O0 , -O1 , -O2


PETSc:
Adapt a makefile from examples and see the example source code and makefile in /scratch/opt/petsc-2.1.1/src/sles/examples/tutorials .
Hypre:
(contributed by Andrew Knyazev)

There are several options to compile hypre on CUD Beowulf.
cd to hypre/src directory.
For gcc with scali, run
configure --with-CC=gcc --with-CXX=g++ --with-mpi-include=/opt/scali/include --with-mpi-lib-dirs=/opt/scali/lib --with-mpi-libs=mpi --with-CFLAGS="-O -O3 -D_REENTRANT -Wall -I/opt/scali/include"
Make sure that you create separate hypre directories for every install, e.g. hypre-1.8.2b_gcc_scali as above.

For pgcc (PGI) with scali first check/set the correct enviroment:
setenv PGI /opt/pgi-3.2
set path=($PGI/linux86/bin $path)
and then run
configure --with-CC=pgcc --with-CXX=pgCC --with-mpi-include=/opt/scali/include --with-mpi-lib-dirs=/opt/scali/lib --with-mpi-libs=mpi --with-CFLAGS="-O -O3 -D_REENTRANT -I/opt/scali/include"

For gcc with mpich run
configure --with-CC=gcc --with-CXX=g++ --with-mpi-include=/opt/mpich/include --with-mpi-lib-dirs=/opt/mpich/lib --with-mpi-libs=mpich --with-CFLAGS="-O3 -D_REENTRANT -Wall -I/opt/mpich/include"

Finally, to compile run "make" in the same directory. Check the make output to make sure that there are no errors (warnings are OK).

Two users can run the PGI compilers at the same time.
Our license allows parallel programs built by PGI compilers to run on up to 64 processors.
 
 

Running Programs

Running a program on the cluster is more complicated then submitting a job on a serial computer. Ideally, each user wants to have sole access to a particular processor and would like to be confident that timing results have not been effected by someone elses jobs. PBS is a batch system that tracks which nodes are in use, and starts a job when the user requested resources are available, which is not necessarily the same time as when the job was submitted. In this way each job gets sole access to the list of nodes chosen by pbs. Manually started jobs are jobs started from the command line and they do not go through the (PBS) batch system; they are highly likely to interfere with jobs in the pbs que. Manual jobs are only permitted on node1 and node2, in other words don't submit them unless you have logged into node1 or node2.

To start jobs from the command line :
1. Serial process example: rsh node1 a.out
2. Scali Parallel process example: mpimon a.out -- node1 2 node2 2
3. MPICH Parallel process example : mpirun -np 4 -machinefile machines a.out

To submit jobs to Open PBS:
1. You need to compose your own PBS job script, which is a regular shell script with extra comment lines that begin #PBS. These comments are meaningful to PBS. The #PBS comments specify the resources needed for a particular job. The job script should read the node list file created by PBS and then submit the job from the "command line" within the script. In other words the user must utilize the names specified by PBS via an appropriate command line. The following examples provide scripts that may be modified and run by a user.

Serial PBS job script submits an m-file script to Matlab thru the batch. Note: Serial jobs scripts must only request one node, see example script for details.

Scali PBS job script with one cpu per node requests 4 nodes from PBS, changes to the executable directory (necessary step), and stores the nodes selected by PBS in the script variable nodes. PBS selects nodes not processors, for example pbs may return nodes = node11 node21 node3 node8. In this example the last line becomes mpimon program -- node11 node21 node3 node8 which executes the parallel executable on 1 processor per node. Give the full executable path to mpimon when it differs from the working directory -- having the executable directory in your path is not enough.

Scali PBS job script with two cpus per node sets nodelist = node11 2 node21 2 node3 2 node8 2 and changes the last line to
mpimon program -- node11 2 node21 2 node3 2 node8 2
.

MPICH PBS job script with one cpu per node

MPICH PBS job script with two cpus per node

Submit a PBS job script with qsub.
When using qsub be sure to submit jobs to the appropriate queue.
See details on the "low" queue here.

a)The basic command is qsub script_name.pbs

b)To redirect error and output files to homedirectory qsub -e ~ -o ~ script_name.pbs

2. ScaPBS is a feature of the Scali interface which can create a #PBS job script for the user. This options is easier but has less flexibility then the methods outlined above.
a)scasub a.out (submit serial job)

b)scasub -mpimon -np <total # cpus> a.out (submit parallel job using ScaMPI mpimon)

c)scasub -mpimon -np <total # cpus> -npn <# cpus per node> a.out (submit parallel job using ScaMPI mpimon)

d)scasub -mpich -np <total # cpus> a.out (submit parallel job using MPICH mpirun)

e)scasub -mpich -np <total # cpus> -npn <# cpus per node> a.out (Submit parallel job using MPICH )
Note: -npn option does not work for code compiled with /opt/mpich libraries.

For more information type scasub. This will give a list of available command line options. scasub is scali wrapper of qsub.

3) If your job will be using a large data file, you will be able enhance preformance by copying the file to local memory. Each node has a local /scratch directory for this purpose. By copying and running from /scratch your job will avoid delays created by accessing the datafile via NFS.
a)Since /scratch is a public directory, you need to create your own working subdirectory /scratch/$USER on each node in the cluster. You can do this by running scratch_dir just once.

b)The script fbcast.pbs is a PBS script that will copy your data and executable files to /scratch/$USER, run your parallel executable on the local data, and then clean the executable and data files from /scratch/$USER.
Note:Users need to customize this by changing a few lines of the script, specifically the file names and directory enviroment variables.

c)You may also run scratch_clean whenever you want to remove all files from the working subdirectory (/scratch/$USER) at each node.

ScaPBS commands: (/scratch/opt/scali/contrib/pbs/bin must be in your PATH)
qstat Prints a summary of queue and job status. Give a job ID for only that job, or qstat -a  for all jobs in the queue or qstat -q  to find out which ques are available (the LM column displays the maximum number of concurrent jobs.)
            Useful command line arguments are -n, -f and -r.
qdel  Delete jobs.
tracejob Prints full information of jobs.
qsig  Sends a user-specified signal to jobs; default is "kill".
qselect Finds jobs matching criteria and writes their IDs. Probably the most useful command line argument is -u user.
qhold Hold a job (temporarily block execution; see qrls)
qrls Release a held job (see qhold)
qrerun Kills the jobs but re-queues them for execution.
qmsg Append a text message to a job's stderr.
qmgr Management interface. Give the server host name. For ordinary users the interface is read only.
pbsnodes Lists the status of all (-a) or selected nodes.
/usr/local/bin/pbsfree Returns the number of available nodes. This is not part of pbs. PBS assigns whole nodes to a job.

While a job is running, its output is on the first node reported by qstat -n in the directory /var/spool/PBS/spool. After the job completes, its output is moved to files jobname.oxxx and jobname.exxx in the directory the job was submitted from. Users should redirect output to a file in there home directory.

Please monitor the status of your job, if it takes a long time, it may have gotten stuck or crashed but the processes keep running and never exit and block a part of the cluster. In that case you must kill your  job correctly.
 
 

Debugging Programs

To debug with gdb:
1. log into beowulf from a machine running X by ssh
2. create file ~/.ssh/environment containing the single line XAUTHORITY= <your home directory>/.Xauthority
3. compile your ScaMPI code with -g flag.
4. gdb does not have multiprocessor debugging. Users may run executable code on one processor per node as follows:
mpimon -debugger gdb -debug all -xterm /usr/X11R6/bin/xterm -display $DISPLAY a.out -- node1 node2

To debug with pgdb:
1. log into beowulf from a machine running X by ssh
2. create file ~/.ssh/environment containing the single line XAUTHORITY= <your home directory>/.Xauthority
3. compile your mpich code with -g flag.
4. PGI 4.0 has multiprocessor debugging. Users may run executable code on one or two processor per node as follows:
$PGI/linux86/bin/mpirun -np 4 -machinefile machines -dbg=pgdbg -nolocal a.out

Debugging with the Totalview debugger:
1.
gcc -g -D_REENTRANT -I/opt/mpich/include -o a.out my_code.c -L/opt/mpich/lib -lmpich
2. create file named machines with two lines
node1
node2
3. ssh node1 and make sure X forwarding works: xterm (a window should pop up)
4. cd to the directory where your executable is and start your process under totalview by
/opt/mpich/bin/mpirun -tv -np 2 -machinefile machines a.out
5. The Totalview window should show up. Press "Go" and refer to www.etnus.com for further documentation.
6. The Totalview licenses available on Beowulf allow 2 users 4 cpus each -- Please remember to exit the debugger when not in use.
 
 

Profiling Programs

The upshot utility can be used to profile ScaMPI applications. This utility requires a log file that is created when the ScaMPI application is running. The ScaMPE library will create this log file; see section Section 2.4.4 of Scali Library Users Guide - version 3.0 (PDF - 289 kB) for more information and some other profiling options. . Please note that the documentation is not complete.

1. Link in the correct mpe libraries.
The following libraries and include directories should be linked before the standard ScaMPI libraries:
-I/opt/scali/contrib/mpe/include -L/opt/scali/contrib/mpe/lib -llmpe -lmpe
The -llmpe library causes the program to print out a logfile with profiling information that wil be viewed in Upshot. For example:
%gcc-3.1 hello-world.c -o hello-world -O -D_REENTRANT \
-I/opt/scali/contrib/mpe/include \
-L/opt/scali/contrib/mpe/lib -llmpe -lmpe \
-I/opt/scali/include -L/opt/scali/lib -lmpi


2. Run the application as you would normally.
After this step you should find the usual logfiles hello-world.eXXXX and hello-world.oXXXX, as well as a file named logfile.clog which contains the profiling information for the program hello-world.c in the directory were the program was launched.

3. Convert the logfile.clog file to logfile.alog
This is the step missing from the official documentation. The format of logfile.clog will not work in upshot. If you run logviewer it will try to load jumpshot which it cannot find (try it and look at the error message). Running jumpshot directly also does not work. The way to get the logviewer to work is to convert the clog file to an alog file. This can be done using
/opt/scali/contrib/mpe/bin/clog2alog logfile
Notice that the file extension is left off. This will create a file named: logfile.alog

4. Run logviewer on logfile.alog.
/opt/scali/contrib/mpe/bin/logviewer logfile.alog
The logviewer script will call the upshot script which is a tcl script. When the GUI appears click the setup button. This will start processing logfile.alog and another GUI window will appear with the profiling information. The GUI is self explanatory.
 
 

More on Petsc installation

By default, Thus
make BOPT=O <your_petsc_code>
defaults to
make BOPT=O PETSC_DIR=/opt/petsc PETSC_ARCH=linux <your_petsc_code>

For various purposes  you might change the defaults to what you need.
For instance:

1.  If you would like to try the newest Petsc you'll need the following lines:
make BOPT=O PETSC_DIR=/opt/petsc-2.1.3 <your_petsc_code>
mpirun -pbs -np 4 <your_petsc_code>

2. You may need to debug your petsc code with pgf90 compiler (PGI version 3.2) and MPICH mpirun. Assume your code is in fortran and it uses complex numbers. In this case you follow the steps:
First set (in csh shell)
setenv PGI /opt/pgi-3.2
set path=($PGI/linux86/bin $path)
and type
make BOPT=g_complex PETSC_ARCH=linux_pgi <your_petsc_code>
to compile your code, then type
mpichrun -tv -np 2 -machinefile machines <your_petsc_code>
to run it in the Totalview debugger (see the Debugging Programs section).

NOTE: mpichrun is aliased to /opt/mpich/bin/mpirun.
 

Items indicated in each row of the following table must be used together to ensure compatibility:
 

Petsc version #  PETSC_DIR  PETSC_ARCH         MPI 
implementation
       Compilers   External link packages    Version of Petsc
         libraries
       2.1.1
 (this is default)
 /opt/petsc
 (this is default)
 linux
 (this is default)
    ScaliMPI
  (this is default)
GNU (gcc,g77,g++) 
    (this is default)
X11,MPE,Matlab,BlockSolve g,  g_complex, g_c++
O,O_complex,O_c++
       2.1.1  /opt/petsc  linux_mpich      MPICH  GNU (gcc, g77,g++) X11,MPE,Matlab g,  g_complex, g_c++
O,O_complex,O_c++
       2.1.1  /opt/petsc   linux_pgi      MPICH PGI (pgcc,pgf90,pgCC) X11,MPE g,  g_complex, g_c++
O,O_complex,O_c++
       2.1.3  /opt/petsc-2.1.3   linux      ScaliMPI GNU (gcc, g77,g++ ) X11,MPE,Matlab,BlockSolve
Hypre-1.6.0
g,  g_complex, g_c++
O,O_complex,O_c++
       2.1.3  /opt/petsc-2.1.3   linux_mpich      MPICH  GNU (gcc, g77,g++ ) X11,MPE, Matlab g,  g_complex, g_c++
O,O_complex,O_c++
       2.1.3  /opt/petsc-2.1.3  linux_pgi      MPICH  PGI (pgcc,pgf90,pgCC) X11,MPE g, O

 
 

More on ScaLAPACK Installation


 
 

The Mini Cluster

The mini cluster consists of 3 separate computers:
master node: oldmath
slave nodes:  oliver, xgrunt

Each of the nodes has 2 Pentium-III processors 498MHz, 1GB memory, about 8GB disk and Red Hat Linux 7.2 (kernel 2.4.7).
Home directories are shared between math,p4,oldmath,oliver,xgrunt and linus.

Available software:
Compilers & Debuggers: GNU and PGI  suites, TotalView(parallel debugger)
Cluster management and interconnect: MPICH(v1.2.3 and 1.2.4)
Job scheduling:  OpenPBS(v2.3.16), Mpiexec(v0.68 -- parallel job launcher via OpenPBS)
Development tools: Petsc(v2.1.1), BlockSolve95, Hypre (v1.6.0)
Applications:  Matlab (v6.1)
Coming soon: SuperLU_DIST (v1.0)
 

Compiling and Linking Parallel Programs

See the beowulf  Compiling and Linking Parallel Programs section above.

Running Parallel Programs

To start jobs manually:
1. To run a parallel process, use mpirun, e.g., mpirun -np 4 -machinefile nodes <your_mpi_executable> to run on 4 processors, 2 each on oliver and xgrunt. Your nodes file should contain the node names and the number of processors used on each node:
oliver:2
xgrunt:2
2. To run a parallel process in a debugger see the Debugging Parallel Programs section below.

To submit jobs to Open PBS using a PBS job script:
1. You need to compose your own PBS job script, which is a regular shell script with extra comment lines that begin #PBS. These comments are meaningful to PBS. Submit the shell script using the qsub command. Use the mpiexec command. mpiexec is a replacement program for the mpirun, which is part of the MPICH. It is used to initialize a parallel job from within a PBS batch or interactive environment. Here is a simple PBS job script named myscript.pbs:
[gtsedend@oldmath ~]$ more myscript.pbs
#PBS -N myscript
#PBS -j oe
#PBS -l nodes=2
#PBS -l cput=1:00:00
#PBS -l mem=256MB
cd ~/a.out_directory
mpiexec ./a.out

To submit this script type
qsub myscript.pbs

For more information see the man page of qsub. To learn more about #PBS comments in PBS job scripts see http://math.cudenver.edu/~jmandel/mri/Schedulers-overview.pdf which is a good reference on how to write PBS scripts.

2. The PBS commands: see the beowulf  ScaPBS commands subsection above.


Debugging  Parallel Programs

1. ssh to one of the nodes (oldmath, oliver, xgrunt)
2. compile your MPICH code with -g flag.
3. run your executable code as follows:
mpirun -dbg=mydebug a.out
Here mydebug is one of the 5 debugger scripts in the /opt/mpich/bin directory: ddd, gdb, xxgdb, dbx or totalview.
 
 
 

Help and Documentation

The web sites linked to the names of the software tools on this page are the primary source of information. For more about Scali, see the Scali System Guide. For more about compiling and linking MPI programs see ScaMPI User Guide. For more about PBS, see OpenPBS Administrator Guide. Some commands have man pages; explore $MANPATH what is available. Many commands provide some help when entered without arguments, or with --help. Various hardware and software manuals are in http://www-math.cudenver.edu/~jmandel/mri. Most important, come to my reading class and seminar!


Created October 3, 2001 by Jan Mandel
Last updated February 18, 2003 by Janine Kennedy