How to use srun to get an interactive node

This is a short post summarizing my effort to use srun to get interactive node in slurm system.

  • Request machin

You can find the official man page for srun here

https://computing.llnl.gov/linux/slurm/srun.html

Srun command gives you the choice of getting control of a node and run jobs interactively instead of submitting batch jobs. There are a large number of options, such as getting the node exclusively and running it in command line mode. Here, I am showing a few examples of how you can use it

The most basic command tries to start a bash shell on the remote node. Which basically gives you access to the node through an interactive bash shell

srun -N 1 -n 1 –pty bash -i 

(-N is the number of nodes, -n is number of tasks, –pty gives you a pseudo terminal that runs bash, -i gives you interaction)

To get the node exclusively (in the case that you don’t want others to use the same node), add a –exclusive to the command, so it looks like

srun -N 1 -n 1 –exclusive –pty bash -i

Another important feature is that you can request two interactive sessions on the same node. This allows you to run another interactive session to monitor the performance of the other session. For example, you can run a top command on the same node to monitor CPU and memory utilization of the first process. This was made difficult in SLURM since you can not ssh into a remote node.

This only works if you didn’t request a node under exclusive flag (won’t allow a second session on the same node if the first session acquired the node through an –exclusive flag)

To do that, first

srun -N 1 -n 1 –pty bash -i 

You should be able to see the name of your node on the command line. To get the same node do

srun -N 1 -n 1 –pty bash -i -w nodename

And you might get the same node. Be default you should specify a list and you will get at least one of the nodes. But now we are sort of forcing the get the node we want.

Another important note is that to use multiple threads (CPUs) for a single process on a node, it is important to specify using the -c parameter (CPUs per task). Other wise your process can only use a single thread. For example, on a Lanka cluster machine, which has 48 hardware threads (24 cores), I would specify the srun command as following

srun -N 1 -n 1 -c 48 –pty bash -i 

To try to request nodes from a specific queue / group of nodes, use the following command

Request Nodes / Run Jobs

srun -h

#/bin/bash
srun -N 1 -n 1 –exclusive -p lanka-v3 –pty bash -i

-p, –partition=partition partition requested

Hope this helps!

Advertisements
This entry was posted in Tools and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s