Is it possible to tell the scheduler (Torque) to run a job on a specific node (list of nodes)?

I would like to run a calculation on a specific node but couldn’t find an option to tell the Torque scheduler to do so. Is this option not implemented or did I just miss that point? If it is not implemented, is this because it was not necessary till now or because integrating it with how schedulers work adds to much complications?

In principle I just need to modify this line in the submit script
#PBS -l nodes=2:ppn=48,walltime=01:00:00,mem=16000000kb
to something like this
#PBS -l nodes=node01:ppn=48+node02:ppn=48,walltime=01:00:00,mem=16000000kb

What would be the best/cheapest approach to achieve this?
Thank you.

Hi @un1queu5er :wave:

Indeed, it seems there is not yet a generic option for explicitly specifying the nodes to run on. I’m assuming this is just because it’s a feature that hasn’t been requested yet.

I’m not very familiar with PBS, but I think it should be possible to achieve this using the custom_scheduler_commands option. Something along the lines of (QE example):

from aiida_quantumespresso.workflows.pw.base import PwBaseWorkChain

structure_data = orm.StructureData(ase=build.bulk('Co', crystalstructure='fcc', a=5.43, cubic=True))
pw_code = orm.load_code('qe-7.2-pw@localhost-pbs')

builder = PwBaseWorkChain.get_builder_from_protocol(
    code=pw_code,
    structure=structure_data,
)
builder.pw.metadata.options['custom_scheduler_commands'] = '#PBS -l ppn=48+node02'

That will insert the line

#PBS -l ppn=48+node02

At the end of the scheduler header.

Thank you for your answer.
I have tried your example. The scheduler complains that I have to set num_machines and num_mpiprocs_per_machine. When I do this I end up with two #PBS -l lines in the submit script. Unfortunately it seems that torque is only using the first line and ignore the second line with the specified node. When I edit the submit script manually and remove the first line it works.

Hi, indeed you are right, the custom_scheduler_commands adds additional lines.

Unless there is a way to specify in the input the nodes as an additional line (probably not, but worth checking), the only solution I see is to create your own scheduler plugin, e.g. a new torque_explicitnodes or something like this, by copying and modifying the torque plugin here: aiida-core/src/aiida/schedulers/plugins/torque.py at main · aiidateam/aiida-core · GitHub
As you see, luckily we had already written in a modular way to cover also the needs of PBSPro that is very similar, and exactly the -l line is the part that changes between the two. So there is only the function _get_resource_lines to adapt, in order to write the line that you want.

Documentation on how to package it as a new plugin can be found here.

However, in your case you will have to somehow pass information on the node names; at the moment we don’t have a way to pass specific information to the scheduler plugin, in addition to the resources or to information that is already standardised (and the node names is not one of these yet).

Probably the ‘cleanest’ way is to define a new JobResource class, copying the PbsJobResource implementation here and adding to it also information on the list of node names (or anything else you need). Then, in your class, you need to specify to use this specific class instead of the default one, as it’s done here.
Finally, you can pass information on the nodes in the metadata.options.resources of the CalcJob, when you submit it.

I hope that these instructions can point you in the right direction.
If you work on this and get stuck, great if you could share a link to a GitHub repository (or similar) where you have started working, so we can provide help more easily.