So far, we have assumed that you’re logged into a standalone server, and can run applications directly. Now - how to use the fortyfour cluster.
ssh username@fortyfour.ibest.uidaho.edu
Note: Student account users do not need to log into fortyfour - you can submit jobs from the standalone servers. Notice your home directory will look the same
ls
We’re going be creating a bunch of temporary files - so create a new directory in lustre and cd into it (substitute your username) :
mkdir /mnt/lfs2/benji/workshop && cd /mnt/lfs2/benji/workshop
Please to not run computationally intensive jobs on the head node
In order to not run computationally intensive jobs on the head node, also log into a standalone server in another tab/window. If you’re using the classroom cluster - don’t worry about this step.
ssh username@zaphod.ibest.uidaho.edu
To run applications on the cluster, you need to use sbatch
. sbatch
is the command used for job submission to the cluster. It takes several command line arguments and can also use special directives found in the submission scripts or command file. Several of the most widely used arguments are described in detail here. The best way to use sbatch
is to write bash scripts. Example sbatch
script:
#!/bin/bash
cd "$SLURM_SUBMIT_DIR"
echo running
source /usr/modules/init/bash
module load R
Rscript -e "rnorm(10)"
sleep 30
echo finished
sbatch rand_nums.slurm
squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
282 primary rand_num user PD 0:00 1 (None)
The job will produce the output file slurm-<job number>.out
which will report the output that would otherwise go to the screen.
benji@fortyfour ~/workshop $ cat slurm-282.out
running
[1] -1.5552951 -1.8221806 0.5190432 -1.2447830 -0.5147968 -1.5253791
[7] 0.8816124 0.6505836 -1.2168808 -1.2094903
finished
benji@fortyfour ~/workshop $
fortyfour ~ # scontrol show partitions
PartitionName=tiny
AllowGroups=satellite,hpc AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=N/A
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=06:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=n[067-103],n[107-109]
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO PreemptMode=CANCEL
State=UP TotalCPUs=776 TotalNodes=40 SelectTypeParameters=NONE
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
PartitionName=short
AllowGroups=satellite,hpc AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=short
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=1-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=n[067-103],n[107-109]
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO PreemptMode=CANCEL
State=UP TotalCPUs=776 TotalNodes=40 SelectTypeParameters=NONE
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
PartitionName=volatile
AllowGroups=gratis AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=N/A
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=5-08:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=n018,n[044-052],n062,n[094-101]
PriorityJobFactor=1 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO PreemptMode=CANCEL
State=UP TotalCPUs=224 TotalNodes=19 SelectTypeParameters=NONE
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
PartitionName=testded
AllowGroups=sysad AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=N/A
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=n018,n062
PriorityJobFactor=1 PriorityTier=20 RootOnly=NO ReqResv=NO OverSubscribe=NO PreemptMode=CANCEL
State=UP TotalCPUs=24 TotalNodes=2 SelectTypeParameters=NONE
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
PartitionName=reg
AllowGroups=satellite,hpc AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=reg
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=7-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=n[001-017],n[019-061],n063,n064,n[067-090],n[094-103],n107
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO PreemptMode=CANCEL
State=UP TotalCPUs=1244 TotalNodes=97 SelectTypeParameters=NONE
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
PartitionName=long
AllowGroups=satellite,hpc AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=long
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=n[001-017],n[019-061],n063,n064,n[067-090],n[094-103],n107
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO PreemptMode=CANCEL
State=UP TotalCPUs=1244 TotalNodes=97 SelectTypeParameters=NONE
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
PartitionName=cmci-gpu
AllowGroups=cmci AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=N/A
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=n[065-066]
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO PreemptMode=CANCEL
State=UP TotalCPUs=80 TotalNodes=2 SelectTypeParameters=NONE
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
PartitionName=gpu-short
AllowGroups=satellite,hpc AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=gpu-short
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=1-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=n104
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO PreemptMode=CANCEL
State=UP TotalCPUs=8 TotalNodes=1 SelectTypeParameters=NONE
DefMemPerCPU=8000 MaxMemPerNode=UNLIMITED
PartitionName=gpu-long
AllowGroups=satellite,hpc AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=gpu-long
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=7-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=n105,n106,n110,n111,n112,n113
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO PreemptMode=CANCEL
State=UP TotalCPUs=56 TotalNodes=2 SelectTypeParameters=NONE
DefMemPerCPU=8000 MaxMemPerNode=UNLIMITED
There are now several partitions to pick from, including two gpu partitions (if your job can make use of GPUs). Please choose the partition that most closely matches the requirements of your jobs. If you have a free account (gratis), the only available partition is the ‘volatile’ partition. Jobs in the volatile partition are subject to preemption.
Partition | Wall-time | Max-Jobs | Nodes |
---|---|---|---|
tiny | 6 hours | no-limit | 38 |
short | 24 hours | 1000 | 38 |
reg | 168 hours | 500 | 105 |
long | infinite | 50 | 105 |
gpu-short | 24 hours | 4 | 1 |
gpu-long | 168 hours | 6 | 6 |
volatile | 128 hours | 50 | 19 |
Now let’s use sbatch
for aligning RNA-Seq data. First we’ll load the ncbi-sra
module:
module load ncbi-sra
Next, get some data from ncbi. Search for “RNA seq”, then filter to Mus Musculus. Use wget
to download:
wget ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/something..something..not..actually..going..to..work../SRR2029055.sra
or better yet, use the SRA toolkit:
prefetch SRR2353195
(if you use the toolkit, the downloaded file is at: ~/ncbi/public/sra/SRR2353195.sra) Or you can just copy the version I downloaded:
cp /mnt/ceph/benji/ncbi/public/sra/SRR2353195.sra ~/workshop/
Use the ncbi-sra toolkit to split the .sra
files into fastq
(run this one on the standalone & update file paths as appropriate):
cd ..
fastq-dump --outdir workshop --split-files ncbi/public/sra/SRR2353195.sra
Tophat needs an indexed genome to map against - use the one I already downloaded (it’s pretty big). Make an sbatch
script (update as appropriate):
#!/bin/bash
#SBATCH -J benji_tophat
#SBATCH -p volatile
#SBATCH --mem=32G
cd "$SLURM_SUBMIT_DIR"
source /usr/modules/init/bash
module load tophat
tophat -p 1 /mnt/ceph/data/Mus/mm10/Sequence/Bowtie2Index/genome SRR2353195_1.fastq
and submit
benji@fortyfour ~/workshop $ sbatch benji_tophat.slurm
Submitted batch job 290
benji@fortyfour ~/workshop $ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
290 primary benji_to benji R 0:13 1 n090
What’s the advantage of running tophat
with sbatch
in this manner versus on a standalone server? Answer: none really - except that there are many more cluster nodes than there are standalone servers. The advantage of sbatch
is when you have a bunch of data/reads or can split up your data.
This tophat
run will actually go for quite a while (hours) - so let’s delete the job. First, use squeue
to get the job number:
benji@fortyfour ~/workshop $ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
290 primary benji_to user PD 0:00 1 (None)
and now delete it:
scancel 290
Lets split up our sequence file and spread it out across the nodes in the cluster.
split -d -l 2000000 SRR2353195_1.fastq SRR2353195_1.fastq.
This will split the millions reads in this file into several files of 2 million reads each. The -d
flag tells split to use decimal indexes, which we will need. We’re going to use a very helpful sbatch
feature - arrays. The sbatch
file will look like:
#!/bin/bash
#SBATCH -J b_th_a
#SBATCH -p volatile
#SBATCH --mem=150G
cd "$SLURM_SUBMIT_DIR"
#Left-pad with zeros as necessary
FIXED_A_ID=$(printf '%02d' $SLURM_ARRAY_TASK_ID)
echo "Running $SLURM_ARRAY_TASK_ID on $(hostname)"
source /usr/modules/init/bash
module load tophat
tophat -p 1 -o ./tophat_out_$FIXED_A_ID /mnt/ceph/data/Mus/mm10/Sequence/Bowtie2Index/genome SRR2353195_1.fastq.$FIXED_A_ID
and submit (adjust the number below for how many files you have), and you should be able to see all the jobs that are spawned:
benji@fortyfour ~/workshop $ sbatch -a 0-19 tophat_array.slurm
Submitted batch job 291
benji@fortyfour ~/workshop $ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
291_[11-19] primary b_th_a benji PD 0:00 1 (Resources)
291_0 primary b_th_a benji R 0:05 1 n090
291_1 primary b_th_a benji R 0:05 1 n091
291_2 primary b_th_a benji R 0:05 1 n092
291_3 primary b_th_a benji R 0:05 1 n093
291_4 primary b_th_a benji R 0:05 1 n094
291_5 primary b_th_a benji R 0:05 1 n095
291_6 primary b_th_a benji R 0:05 1 n096
291_7 primary b_th_a benji R 0:05 1 n097
291_8 primary b_th_a benji R 0:05 1 n098
291_9 primary b_th_a benji R 0:05 1 n099
291_10 primary b_th_a benji R 0:05 1 n100
sbatch
optionsCreate a simple submission file, call it sleep.slurm
#!/bin/sh
for i in `seq 1 60` ; do
echo $i
sleep 1
done
Then submit your job with the output file renamed to sleep.log:
sbatch -o sleep.log sleep.slurm
Submit your job with the standard error file renamed:
sbatch -e sleep.log sleep.pbs
Send standard output and standard error to different files:
sbatch -o sleep.log -e sleep.err sleep.pbs
Place the output in another location other than the working directory:
sbatch -o $HOME/tutorials/logs/sleep.log sleep.slurm
The mailing options are set using the –mail-type argument. This argument sets the conditions under which the batch server will send a mail message about the job and –mail-user will define the user that emails will be sent to . The conditions for the -m argument include:
FAIL: mail is sent when the job is aborted.
BEGIN: mail is sent when the job begins.
END: main is sent when the job ends.
for example sleep.slurm
#!/bin/bash
#SBATCH -J sleep
#SBATCH -o myoutput.log
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=rlyon@uidaho.edu
for i in `seq 1 30` ; do
echo "Hello from stdout on $HOSTNAME: $i"
echo "Hello from stderr on $HOSTNAME: $i" 1>&2
sleep 1
done
Using the sleep.slurm script created earlier, submit a job that emails you for all conditions:
# sbatch sleep.slurm
For now lets look at the resources requests. By default your job will get 1 thread on one node, but you can request more.
sbatch -N num_nodes -n num_cores sleep.pbs
The -N
option is the number of nodes, and -n
is the number of processors/cores. Note that just requesting more than one node does not mean that your job will run on more than one node - for this you typically need to use MPI. Some software will attempt to use all cores available on the compute node and it is best to request 16 cores (the most common number of cores present).
Often you will find that you need to run the same task for a bunch of input files, and an array job is a good way to do this. The trick is translating the $SLURM_ARRAY_TASK_ID variable (which is just an integer) into a file name. There are a couple ways to do this. The first is to just name your files with a sequence of integers, eg:
inputfile.1.dat
inputfile.2.dat
inputfile.3.dat
etc...
However, you can keep your data file names using the following technique. Basically we create an index file which translates the file names into integers using a bash array.
Here’s the script to set up the input files, and submit the job
#!/bin/bash
if [ -z $1 ] ; then echo "need to specify a directory"; exit 1; fi
if [ -f test.list ]; then rm test.list ; fi
fcount=0
echo "declare -A files" > test.list
for file in $(ls $1) ; do
echo "files[$fcount]=$file" >> test.list
let fcount=fcount+1
done
sbatch -a 0-$fcount testA.slurm -J testA
The slurm script
#!/bin/bash
cd $SLURM_SUBMIT_DIR
source test.list
source /usr/modules/init/bash
module load some_module
some_module_binary ${files[$SLURM_ARRAY_TASK_ID]} > outdir/$(basename ${files[$SLURM_ARRAY_TASK_ID]} ".dat").out
You can pass user defined environment variables to a job by using the –export argument.
To test this we will use a simple script that prints out an environment variable. variable.slurm
#!/bin/sh
if [ "x" == "x$MYVAR" ] ; then
echo "Variable is not set"
else
echo "Variable says: $MYVAR"
fi
Next use sbatch without the –export and check your standard out file
sbatch variable.pbs
Then use the –export to set the variable
sbatch --export=MYVAR=some_value variable.slurm
We have several applications compiled for MPI (Message Passing Interface).
Let’s look at an example with ABySS. First, get the test data.
wget http://www.bcgsc.ca/platform/bioinfo/software/abyss/releases/1.3.4/test-data.tar.gz
tar xzvf test-data.tar.gz
then the sbatch script (abyss-mpi.slurm):
#!/bin/sh
#SBATCH -N 2
#SBATCH -p volatile
. /usr/modules/init/bash
module load openmpi-apps/1.10.2/abyss/1.9.0
# The np=? parameter must match the number of nodes allocated in the SBATCH line.
abyss-pe np=2 k=25 name="test" in='test-data/reads1.fastq test-data/reads2.fastq'
Here’s an example with raxml. First copy some data:
wget http://www.hpc.uidaho.edu/example-data/dna.phy
the sbatch script:
#!/bin/bash
#SBATCH -J raxml-mpi-tester
#SBATCH -N 4
#SBATCH -p volatile
source /usr/modules/init/bash
module load openmpi-apps/1.10.2/raxml/8.2.8
cd $SLURM_SUBMIT_DIR
ulimit -l unlimited
mpirun raxmlHPC-MPI -f a -s dna.phy -p 12345 -x 12345 -# 10 -m GTRGAMMA -n T$RANDOM
submit
sbatch raxml-mpi.slurm