Slurm – Center for Brain Science

Need help with something SLURM-related? Here’s who to ask for help!

Issues with:	Contact:
General help First time user issues Script won’t run or runs with errors General bugs in workflows	Jenn Segawa (jsegawa@g.harvard.edu)
Performance issues Issues with scripts that have run in the past	rchelp@rc.fas.harvard.edu (Also check their Self-Help Documentation or attend their Zoom Office Hours)

What is the SLURM compute cluster?

(jump link)

Slurm is the scheduler on the FASSE cluster. RC has a page with basic information as well as helpful FAQ.

Submitting a job to SLURM (Basic)

What are the methods for submitting a job to SLURM ? How do I choose?

(jump link)

There are two ways to run your script:

You can submit a job from the command line, either from a

Which method should you choose?

That depends on your situation. If you have a script that you built previously, then a command line (Option 1) will probably be easiest. However, you’ll have to type in all the flags every time, and the command can get kind of tricky when you want to include a bunch of things. You could save this command in a text document, but that’s essentially a batch script anyway.

The batch script (Option 2) is useful when you want to do a few things in addition to run your computationally intensive script (e.g., change directories, or make a directory) and then run your script, or if you have many flags you want to set. This is basically a bash script and can be very powerful.

What are the flags I can use to specify my SLURM batch job?

(jump link)

Here are the basic flags/variables you will have to set regardless of what method you decide to use to submit a job to SLUM.

Flag	Details
`-p`	Queue you want to use (e.g., `fasse`, `fasse_bigmem`, `fasse_gpu`)
`--mem`	Max amount of memory reserved for your job, in MB. 4000 is probably more than most jobs will take, but it is a good starting point. If you have a TR less than 2 seconds, or 1.5mm data, you will most likely need more memory than usual. See below for how to figure out how much your job actual took, so you can be more reasonable in future calls of the same or similar scripts. Your script will get killed if it exceeds the memory requested, and if you consistently over request your priority for submitting will be low. If you find you need a lot of memory (~50 gigs or more), you might need the bigmem queue.
`-t`	Max amount of time your script will be allowed to run, in minutes. If it goes longer, it will be killed. If you are unsure, be generous. Overestimating time doesn’t hurt your priority. See below for how to tell how long it actually took. If your job will take a long time, you can use the format: `D-HH:MM`, so if you wanted to request 4 days, 2 hours and 15 min, it would be: `-t 4-02:15` Alternatively, you can provide one number that represents the minutes, so if you wanted to allow 2 hours, it would be: `-t 120`.
`-o`	The output file, where things that would normally be written out to the screen go. The `%j` will be replaced by the job number. If you don’t specify this in the output file name, it can be lost, which makes checking on how much memory and time your job took difficult. `-o /ncf/mri/01/users/mcmains/myscript_%j_output.out` If you don’t specify this, an output file will automatically get created in the directory it was run from, called: `slurm-jobnumber.out` where `jobnumber` is the number your job received. To see the progress of your script, you can ‘more’ the `.out` file:`more slurm-jobnumber.out`
`--mail-type`	Include this if you want it to send you an email when it is done. This will send you an email when it is done that has the `jobid` in the subject line.
`--mail-type=END`	This will tell you if it completed successfully or failed, at least in the execution of your script. It won’t for instance, know if fcfast failed for some reason, but if you have an error in your homemade batch script, it will come back as FAILED. Here is an example email subject line: `SLURM Job_id=42754682 Name=sbatch Ended, Run time 00:00:06, FAILED, ExitCode 1`
`--wrap`	Takes the executable script you want to run, followed by any flags it takes, all in double quotes. If the script you want to submit is not in your path, such as when you make your own script to run, you need to make sure you give the full pathname to the script and that it is executable (`chmod u+x scriptname`). If it is a standard script like procfast or recon-all this is not necessary.

How do I submit my job via the command line?

(jump link)

From fasseood or a login node you can submit a job via the sbatch command using the --wrap flag. An example is:

sbatch -p fasse --mem=1024 -t 240 -o /ncf/mylab/mysubjects/outfiles/reconall_mysubj1_%j --wrap="recon-all -all -subjid mysubj1

--wrap takes the executable script you want to run, followed by any flags it takes, all in double quotes.

If the script you want to submit is not in your path, such as when you make your own script to run, you need to make sure you give the full pathname to the script and that it is executable (chmod u+x scriptname). If it is a standard script like procfast or recon-all this is not necessary.

How do I submit my job via a batch script?

(jump link)

For this method, you will create a file, say via gedit, (gedit my_first_script.sh)that contains the flags discussed above. This is a bash script (as our cluster uses bash, as opposed to tcsh, or csh). Therefore, the first line of your file will always have the line below, which specifies it is a bash script. The next several lines specify all the flags talked about above, and then the last lines would be what you want to run. This example generates (via $RANDOM) a bunch of random numbers, puts them in a file, and then sorts them. You could also simply call recon-all, by putting recon-all -all -subjid mysubj1, after the last #SBATCH line.

#!/bin/bash # #SBATCH -p fasse # partition (queue) #SBATCH --mem 100 # memory #SBATCH -t 0-2:00 # time (D-HH:MM) #SBATCH -o myscript_%j_output.out # STDOUT #SBATCH --mail-type=END # notifications for job done for i in {1..100000}; do echo $RANDOM >> SomeRandomNumbers.txt done sort SomeRandomNumbers.txt

To run you script, in a terminal, type below from the location where it is, or you have to specify the full path to where it is.

sbatch my_first_script.sh

Managing jobs

How do I cancel a job?

(jump link)

Research computing has an extensive page discussion some useful slurm commands. Here are some of the most commonly used ones:

Command	Details
`sacct`	To see all of your recent jobs (running, pending, and completed). This can return a lot if you have or are running a lot of jobs. If you want to get information about whether a particular job use: `sacct -j jobnumber`
`scancel jobnumber`	To cancel a job
`scancel -u your_user_name`	To cancel all the jobs you have running
`scancel -u your_user_name -p queue_name`	To cancel all the jobs you have running on a particular queue
`scancel -t PENDING -u your_user_name`	To cancel all pending jobs

How do I find how much memory and time my batch job took?

(jump link)

This is useful to know so that you can request the appropriate amount of time and memory when you run your script, or something similar, again. You want to be as accurate as possible so that resources can be fairly spread across multiple jobs. In addition, if the cluster is being heavily used, and you request a bunch of memory, it make take awhile for the requested memory to become available. Your script will stay pending until your requested memory is available.

sacct --format JobID,ReqMem,MaxRSS,Elapsed,State

This will return something that looks like:

JobID ReqMem MaxRSS Elapsed ExitCode State

------------ ---------- ---------- ---------- ----------- ------------

425616 1000M 8.82M 00:01:45 0:0 COMPLETED

So your script took 1 min and 45 seconds, and 8.82 MB of memory. Your batch job specified that it would take 2 hours and 1000 MB (--mem=1000 -t 120 … remember memory is in MB and this way of specifying time is in minutes), way too long and too much memory. So if you were to run it again, you might use a command that requested slightly more memory (~20%) and time than it needed, which would look like:

sbatch -p fasse --mem=10.5 -t 4 --wrap="/ncf/mylab/myspace/myscript.sh"

What’s my priority?

(jump link)

To see your priority score:

sshare -U

Your priority score is the last number that comes up (FairShare). Larger is better. Generally a priority score above .5 is considered good, below .5 is bad. This score is currently based on the combined information from all the members of the group.

Why is my job pending?

(jump link)

To see your pending jobs, you can type:

squeue

This should return something like:

JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 73271197 fasse myscript mcmains R 0:30 1 fasse22

If it is pending, the reason will usually be Resources or Priority. If it is resources, it means there aren’t enough free nodes/cores or enough memory to run your job. If its priority, it means their are people above you. See here for more about priority.

To see how many people are ahead of you in the queue:

showq-slurm –p ncf –o

Jobs listed at the top are next in line.

Submitting jobs (Advanced)

How do I run my MATLAB script via sbatch --wrap?

(jump link)

This is only complicated because of all the quotes. This example script that takes four inputs, two strings followed by two numbers.

sbatch -p fasse -n 1 -t 04-15:01 --mem=2000 --wrap="matlab -nodisplay -nosplash -nojvm -r $'myscript(\'${subject}\',\'test\',9,0);exit'"

Generally, everything following the -r gets put in quotes. Given the –wrap command is in quotes, we enter quote unhappiness. The first thing that will come after the -r is a $', followed by your script name. If you need to use quotes in your function call, such as around strings, you need to put a \ before the single quote. The whole thing then ends with a single quote followed by a double quote. In this example, the first input passed is a bash variable that is set somewhere else to be a string, hence the quotes, followed by a string (test) and two numbers, (9,0). The exit makes sure that MATLAB closes after it finishes.

How do I run a job that requires A LOT of memory?

(jump link)

The max amount of memory you can request on the regular cluster is about 250 G. However, it might take a long time for your job to run because this is taking a substantial portion of the available memory on the cluster. If you find yourself needing a lot of memory (> 30 G) you might consider using the big memory nodes, that have about 2 T of available memory:

sbatch -p fasse_bigmem --mem=30000 -t 0-02:00 -o ./myoutput_%j --wrap="/ncf/mylab/myspace/myscript.sh"

The key is using either the fasse_bigmem (up to 499 GB) or fasse_ultramem (up to 2000 GB) queues. To submit a job to these queue you must request at least 30 gigs.

Also, please keep in mind that this is not the exact same hardware as the regular compute cluster. Therefore, the numbers you get back might be slightly different than the ones you would get if you ran it on the regular cluster. So make sure to not, say, run all your controls here and all your patients on the regular compute.

That being said, if you need it for one subject, but don’t need 50 gigs for all subjects, you could request 50 for the others just to end up on this node. Keep in mind repeated over requesting will hurt your priority and might end you up on the monthly ‘bad’ list, that will result in very friendly people from RC contacting you to make sure you know what you are doing.

Troubleshooting and common SLURM errors

(jump link)

A variety of problems can arise when running jobs on FASSE. Many are related to resource mis-allocation, but there are other common problems as well. Even your script seems to have finished successfully, you should look at the last line of the output file to make sure it wasn’t killed by the job handler.

tail myscript_71827678.out

This will show you the last 5 lines of the output file. You don’t want the last line to say something like:

slurmstepd: error: Exceeded step memory limit at some point.

Error	Likely cause
`JOB <jobid> CANCELLED AT <time> DUE TO TIME LIMIT`	You did not specify enough time in your batch submission script. The `-t` option sets time in minutes or can also take `D-HH:MM` form (`0-12:30` for 12.5 hours)
`Job <jobid> exceeded <mem> memory limit, being killed`	Your job is attempting to use more memory than you’ve requested for it. Either increase the amount of memory requested by `--mem` or `--mem-per-cpu` or, if possible, reduce the amount your application is trying to use. For example, many Java programs set heap space using the `-Xmx` JVM option. This could potentially be reduced.
`slurm_receive_msg: Socket timed out on send/recv operation`	This message indicates a failure of the SLURM controller. Though there are many possible explanations, it is generally due to an overwhelming number of jobs being submitted, or, occasionally, finishing simultaneously. Try waiting a bit and resubmitting. If the problem persists, email RC (rchelp@fas.harvard.edu).
`JOB <jobid> CANCELLED AT <time> DUE TO NODE FAILURE`	This message may arise for a variety of reasons, but it indicates that the host on which your job was running can no longer be contacted by SLURM.

How do I submit a job that uses more than 1 core (runs in parallel)?

(jump link)

If you have a script that can take advantage of multiple cores, you can request them via sbatch. There are several important flags. Keep in mind that requesting more than 1 core only helps you if your script utilizes some kind of parallelization.

sbatch Flag	Details
`-n`	Number of compute cores you want. 8 is the polite max.
`-N 1`	Requests that the cores are all on one node. Only change this to >1 if you know your code uses a message passing protocol like MPI. SLURM makes no assumptions on this parameter — if you request more than one core (-n > 1) and your forget this parameter, your job may be scheduled across multiple nodes, which you don’t want.
`--mem`	When requesting multiple cores, this is the amount of memory shared by all your cores. If your cores are spread out over multiple nodes (using something like MPI), you want to use the flag –mem-per-cpu which requests memory for each core.

How do I submit a script that loops over a bunch of subjects?

(jump link)

There are two ways to accomplish this:

via the --wrap flag, or
via a batch script.

Regardless of how you do it, you want to practice good etiquette: don’t submit a bunch of jobs that run for less than 5 min, and pause between submitting each script.

Via the –wrap flag

You want to create a script, ie text file that will contain your script, using a text editor like sublime. This can be in any language you want, here I will demonstrate it with bash.

#!/bin/bash #set your subjects subjects=(150101_subj1 150102_subj2 150102_subj3) #loop over your subjects for subj in ${subjects[*]}; do echo $subj sbatch -p fasse -t 2-0:00 --mem=1024 -o ${subj}_%j.out --wrap="recon-all -subjid ${subj} -all" sleep 1 # pause to be kind to the scheduler done

You would then want to make this executable (chmod u+x my_script.sh). And would run it via the command line:
./my_script.sh

Via a batch script

This requires creating two scripts. One similar to what we showed above, that loops over your subjects, and one that contains the sbatch flags (your batch script), which is like what is described here (submitting via a batch script). First, lets make our script to loop over subjects, my_script.sh:

#!/bin/bash #set your subjects subjects=(subj1 subj2 subj3) #loop over your subjects for subj in ${subjects[*]}; do echo SUBJ: $subj sbatch -o ${subj}_%j.out mybatch_script.sh ${subj} #you can follow your batch script call with any number of inputs it needs, in this case we are passing it one with the subject ID sleep 1 # pause to be kind to the scheduler done

Now we can write our batch script (mybatch_script.sh):

#!/bin/bash #SBATCH -p fasse # partition (queue) #SBATCH --mem 1024 # memory #SBATCH -t 2-0:00 # time (D-HH:MM)

recon-all -subjid ${1} -all

This will take an argument (subj). As it is a bash script, it will automatically parse inputs you give it when it is called, and place them in variables (1,2,3), reflecting the order in which they followed the script name. In this case, $1 gets assigned the subject ID. After making sure your script is executable (chmod u+x my_script.sh), you can run it:
./my_script.sh