There are two different ways to run PUQ on compute clusters.
The first method uses a batch script to run PUQ. The PUQ control script uses InteractiveHost like normal. A typical PBS script would look like this:
#!/bin/bash -l
#PBS -q standby
#PBS -l nodes=1:ppn=48
#PBS -l walltime=0:02:00
#PBS -o run_pbs.pbsout
#PBS -e run_pbs.pbserr
cd $PBS_O_WORKDIR
source /scratch/prism/memosa/env-hansen.sh
puq start my_control_script
Of course you need to set the walltime, nodes and ppn for your run.
Another way to use PUQ with clusters is by using PBSHost. This method runs a PUQ monitor process on the frontend which submits PBS or Moab jobs as needed. This allows you to monitor the progress of PUQ. The disadvantage of this approach is that submission of new batch jobs stops if the PUQ monitor is killed.
To use PBSHost, in any script where you see:
host = InteractiveHost()
simply replace InteractiveHost with PBSHost. You will need to give PBSHost a few additional arguments.
The is a sh or bash script that normally loads modules and sets paths.
cpus : Number of cpus each process uses.
way to determine the number of cpus per node, so there is no default. You must supply this. It does not have to be the actual number of CPUs in the clusters’ nodes, but it should not be more.
qname : The name of the queue to use. ‘standby’ is the default
is HH:MM:SS. Default is 1 hour.
Used when the testprogram requires modules to be loaded to run. For example, to run a matlab script you will need to set this to [‘matlab’]
This is used to pack many small jobs into one PBS script.
PBSHost will create 10 PBS scripts, each with 1000 jobs, running 8 at a time (because each takes only 1 CPU and nodes have 8). Walltime is 1000 * 3 / 8 seconds which is a bit over 6 minutes. I rounded up to 10.
PBSHost will create 10 PBS scripts, each with 1000 jobs, running 2 at a time (because each takes only 1 CPU and nodes have 2). Walltime is 1000 * 3 / 2 seconds which is 25 minutes.
PBSHost will create 1250 PBS scripts, each with 8 jobs, running 8 at a time (because each takes only 1 CPU and nodes have 8).
Note
There is currently a hardcoded limit of 200 PBS jobs queued at once. PUQ will monitor PBS jobs, and as they complete, will submit more until all 1250 have completed.
PBSHost will allocate 6 nodes for each job. 64 PBS jobs will be submitted. Each PBS job will run two mpi jobs sequentially. walltime should be set to twice what each job takes to run.