Debian Clusters for Education and Research: The Missing Manual

Torque Queue Configuration

From Debian Clusters

Jump to: navigation, search

Contents

About Torque

If you haven't done the initial set up of Torque and Maui yet, then the Using a Scheduler and Queue page may be of interest.

A Word about Queues

Queues are what allow optimized scheduling on a cluster. The scheduler, Maui will receive the information about the queues from Torque, and it uses the information about the queues in order to make decisions about what jobs will run when (and on which nodes). Without queues, the cluster would just be running batch jobs in a first come, first served basis.

For instance, suppose the cluster is currently full and three jobs are submitted. One needs to run for two hours on two nodes, one needs to run for four hours on two nodes, and one needs to run for six hours on one node. The scheduler could see that one node will be coming available, and schedule the six hour job immediately. On the other hand, two nodes may be coming available shortly, and then the priority of the queues would determine whether the two node or four node job should run.

In their qsub scripts, users specify what resources they need for a job. They can also specify to qsub (on the command line) what queue their job should be put into.

Copying from an Existing Setup

If you're lucky enough to have an existing setup with queues already configured, you can copy this information over to another server. Or, if you'd like, you can use the Torque Queues Example from a different cluster at my institution. There may be a better way to do this - and if you know of one, please e-mail me at kwanous <at> debianclusters <dot> org and let me know - but this is a route that worked for me.

All of the queue configuration files are stored at $PBSHOME/server_priv/queues ($PBSHOME is /var/spool/pbs if you followed my Torque tutorial). On the head node of the cluster with the queues configured, cd into this directory. For each of the queues on that cluster, you need the setup information. To grab all of it in one fell swoop, run

for x in `ls`; do qmgr -c "print queue $x"; done >> /tmp/queues
  • /tmp/queues is where the output will be saved at. You can change this to something else, if you want.

Then, you can either copy the outputted file by hand (cat it and then copy-paste to a new file) or rsync it over to your new cluster. Once the file is on the new cluster, it's pretty easy to plunk it in, because the file is already formatted for input to qmgr. Just run

cat /tmp/queue | qmgr

If the queues have been created, the corresponding files will be generated in $PBSHOME/server_priv/queues. You may need to restart the pbs_server in order for the changes to take place in the live queues. This is done with

killall -KILL pbs_server
pbs_server

Configuring Queues by Hand

The same kind of format can be used to generate queues by hand. Generally this is done by creating a file with all the input and then piping this to qmgr as with the example above. There are quite a few different options for queue configuration, including access control lists, maximum numbers of jobs the queue will take, whether the queue is active and should run jobs, and more in addition to indicating the available resources and walltime to use for this queue.

Let's examine one of the ones automatically generated in the Torque Queues Example.

create queue long
set queue long queue_type = Execution
set queue long Priority = 60
set queue long max_running = 128
set queue long resources_max.cput = 10:00:00
set queue long resources_min.cput = 02:00:01
set queue long resources_default.cput = 03:00:00
set queue long resources_default.walltime = 04:00:00
set queue long max_user_run = 8
set queue long enabled = True
set queue long started = True
  • The first part of defining a queue is the create queue directive. Here, the name of the queue is "long". After this line, we need to specifically tell qmgr which queue we're configuring, so "long" will be repeated for the rest of the setup lines.
  • queue_type specifies one of two types - route or execution. Route queues are responsible for putting jobs into other queues based on its attributes. Execution queues are ones that jobs will actually run in.
  • priority can be used to assign different preferences to queues. Zero is the default value, and ???!?!?!?is higher!?!??!?!?!
  • max_running is the highest possible number of jobs in this queue at any given time
  • resources_max, resources_min, and resources_default are used to designate the maximum, minimum, and default resources. There are quite a few different values that can be specified here -
    • cput - CPU time
    • nodes - the number of nodes
    • ncpus - the number of CPUs
    • walltime - how is walltime different from CPU time?
  • max_user_run is the highest number number of jobs that a user is allowed to have running in the queue at any given time.
  • enabled allows the queue to accept job submissions. This is false by default.
  • started allows the queue to run job submissions. This is false by default.

Once you have a file listing all the queue creation details, then you can pipe this file into qmgr to create the queue, like this:

cat /tmp/newqueues | qmgr

Alternatively, you can bring qmgr up in interactive mode with just qmgr and type in the lines one at a time.

Either way, you may need to restart the pbs_server in order for the changes to take place in the live queues. This is done with

killall -KILL pbs_server
pbs_server

References

Personal tools