PRISM Workshop

Submitting Jobs to the Cluster


November, 2021

Laura Moses

Overview

This is a walkthrough of submitting an job to the Unity cluster. This is meant to give you some basic tools and resources for using cluster computing at OSU. Cluster or High Performance Computing most generally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer in order to solve large problems in science. This tutorial is meant to give you some tools and resources for data management and analysis when time, memory and processing makes it harder to do work on a local machine; your laptop or desktop is not practical. The code presented here is generic and meant to serve as an outline for your own projects. Adam Lauretig put together a tutorial a while back that I am basing portions of this tutorial on https://github.com/adamlauretig/Unity_Intro. Daniel Kent also put together a tutorial on h20 that I base parts of this on on as well.

Arts and Sciences VPN

If you are off-campus, to connect to the Unity cluster using a terminal requires that you connect to the Arts and Sciences VPN. Instructions for the VPN are here: https://osuasc.teamdynamix.com/TDClient/1929/Portal/KB/ArticleDet?ID=14542

Logging Into Unity

If you are using the OnDemand portal you can log in and use Duo to authenticate without a VPN here: https://ondemand.asc.ohio-state.edu/ To log in open a terminal running bash, use your login ID and the specific file names you need to submit, create, copy or run. Make sure to change these commands for your ID and file. Once you are connected to your operating system’s VPN, go ahead and pull up your terminal. If you want, RStudio now has a bash built-in for tasks like this. Enter into your terminal:

ssh username.#@unity.asc.ohio-state.edu

A password section will come up. You won’t be able to see what you are typing, but enter your OSU password and hit enter--this will log you in. Now, you are at your Unity homepage. The first time you log in, the directory should essentially be empty, aside from a welcome file.

The easiest way to create and manage files is to log in to the OnDemand portal and manage files at https://ondemand.asc.ohio-state.edu/. To make the folder your files will be stored in the terminal use:

mkdir dir_name

Note that dir_name is a name you've chosen for this folder. You can choose any directory name you want.

Once you've created a directory for your scripts and data to go; you can now run your script by submitting a job or start an interactive session to debug, test, or importantly setup an enviroment by installing packages you need to run analysis that will be submitted as a batch job. If you are going to use packages that are not installed on your Unity profile before, then you’ll want to start up an interactive session to install or create an enviroment.

Interactive Sessions

By default in interactive mode you get one hour with one core on one compute node with 3 GB of memory. You can get more information on sinteractive by running it with the --help argument. To start an interactive session, enter the following into your terminal session:

sinteractive --time=00:00:00.
sinteractive --h

Note that --time is H:M:S so if you wanted to run an interactive session to debug something quickly in 20 mins, or check if a package is loading; you could request 20 mins by using --time=00:20:00. To get help with the paraementers for an interactive session, use --h.

Now change to the directory you want to work in:

cd dir_name
R Session

And let’s start R up. Unity starts with a basic R setup, with a minimal amount of functionality ready so that computational needs are minimized. To load R:

module load gnu/9.1.0
ml R/4.0.2

You can use different R versions by loading different gnu or intel and R modules. I just have found these two work well and have all of my R packages installed for them. Now we can start R:

R

You're now running R on the unity cluster and interacting with it in your terminal. So now you can type R code. For example you can now install the necessary packages for your project; here I demonstrate installing the necessary files for the tutorial on github:

install.packages("foreach")
install.packages("doParallel")

The installs take longer than on your laptop, but will work. You will be asked which CRAN server you want to install from. Just type in the number of the one you prefer and hit enter.Once the installs have gone smoothly, we can close R by typing ctrl + z or cmd + z, depending on your operating system. This adds R packages to the R library associated with your account on the cluster.

Python Session

To do the same for python, we need to load the python module and then, create or activate an enviroment for our project. Clusters at OSU reccomend and work with the Ananconda distribution, and I reccomend using this one. To load python:

module load python/3.7-conda4.50
source activate env_name
To set up your conda enviorment, I reccomend building it locally, exporting the requirements.txt and then building it on the cluster from that file.

Submitting a Computing Job

The value in using the cluster is to run jobs that require more time and memory than your local compupter so we want to set up an R script or python script and submit to the cluster’s queue using SLURM which is a job scheduler that manages the resources on the cluster.

The necessary file for this is a .sh script. Here is an example SLURM script:

#!/bin/bash
#SBATCH --time=00:30:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --job-name=prism_tutorial
#SBATCH --mail-type=ALL
#SBATCH --mail-user=name.#@osu.edu

ml gnu/9.1.0
ml R/4.0.2
#
cd dir_name
###commands to run
#for R:
Rscript script_name.R

Note that the #SBATCH commands are not comments, but parameters for your computing job. script_name.R is the name of the file right now. Change that to your file name. Walltime is the amount of time you think you will need. nodes=1... is the amount of computing resources. Instead of R CMD BATCH you can also run Rscript. Make sure that you change the email to your own. That way you will get the necessary email updates about your job’s status!

Once this SLURM script is ready, you can go ahead and upload it the Unity cluster. You’ll also want to upload the R or python script (and any necessary data), otherwise you’ll be asking Unity to run a script it doesn’t have access to. The easiest way is to log in to the OnDemand portal and manage files at https://ondemand.asc.ohio-state.edu/. If you need to upload to Unity from terminal, make sure you are using a terminal session that is not logged-in to Unity, rather a terminal on your local machine.

To upload files in terminal use the secure copy paste commnand, scp:

scp file_name name.#@unity.asc.ohio-state.edu:dir_name/

Now, the SLURM script is available to submit from your Unity profile. Let’s log in and submit. Switching back to our Unity-terminal, we can check our .sh script to make sure it looks good with nano file_name.sh. Nano is just a text editor built into the terminal Unity works through. As long as all looks good, you can exit this text editor with ctrl (command in macs)+z.

To submit script_file.sh to Unity’s queue, we enter:

sbatch script_file.sh

To monitor your job’s status, you can enter:

squeue -u name.# 

Copying Unity Output to Your Local Computer

Once your job finishes running you’ll get an email letting you know if everything worked or not. Assuming your Unity job worked, you’ll want to move the results to your laptop for easier access. I reccomend you do this through the OnDemand portal. If you need to do this from the terminal you can do it with scp, but switch the order of items.

scp name.#@unity.asc.ohio-state.edu:dir_name/file .
scp name.#@unity.asc.ohio-state.edu:dir_name/file /local/directory/ 

Where the command after scp is the location of your file in Unity. And the second part – . – is just saying place the file in this directory; you can also type the directory location you want to save the file, as shown on the second line above.

To end your Unity session, just type logout in your terminal.

Useful Bash Comands:

To navigate in terminal, here are some useful bash commands:

ls #list files
					
pwd #print working directory
					
mkdir #make directory
					
cd folder_path #change directory to folder_path
					

You can request Unity access here: https://osuasc.teamdynamix.com/TDClient/1929/Portal/Requests/ServiceDet?ID=23762

Matt Denny's HPC Intro: https://www.mjdenny.com/workshops/HPC_ICPSR_14.zip

Ohio Super Computer Getting Started: https://www.osc.edu/resources/getting_started/hpc_basics

Alnasir JJ. Fifteen quick tips for success with HPC, i.e., responsibly BASHing that Linux cluster. PLoS Comput Biol. 2021 Aug 5;17(8):e1009207. doi: 10.1371/journal.pcbi.1009207