bilke
May 3, 2024, 12:33pm
1
I have a remote computer setup with the core.slurm
-scheduler. But for some jobs I want to run without slurm directly (core.direct
) on the login node of the HPC system. How to do that?
I see two possible workarounds:
creating another computer with the direct scheduler or
before running / submitting the job setting the scheduler on the computer with computer.scheduler_type = "core.direct"
and setting it back to core.slurm
afterwards.
Is there a more elegant solution to this? Would be nice if this could be configured as a metadata option.
Thanks,
Lars
Hi Lars, I think you have pretty much covered all possible solutions that I see. I would personally simply create a second computer and suffix its label with -direct
. You can use verdi computer duplicate
to make it easy to clone it. It will work as verdi computer setup
but all prompt values will be pre-filled with the existing values. So you can just press enter for all of them except for the scheduler type where you change it.
This is something I’ve also thought about in the past, I’ve added a note to the road map item that discusses making computer entities more flexible:
opened 04:33PM - 01 Mar 23 UTC
roadmap/proposed
## Motivation
When computers and codes were originally designed, a certain le… vel of rigidity on their properties was established in order to try to impose a stricter level of control over the preservation of the provenance. However, the frequent changes an updates that computer clusters perform on their systems forces users to be also updating the ways in which they access those resources. Part of this required versatility was taken into account by allowing some parameters of computer access to be set in the `configuration` of the computer, rather than in its attributes during setup. However, many users find that this is not enough and express frustration at the limitations that these entities still have.
Some examples include:
- After updating software in clusters, some software change location and thus the respective code needs to be updated or changed to a new one. Creating a new code is the natural solution, but it tends to then over-complicate querying later.
- One may want to set up a computer to be able to be used with different schedulers (specially considering the inclusion of meta-schedulers such as [aiida-hyperqueue](https://github.com/aiidateam/aiida-hyperqueue)), but by doing so the remote folders created by one are technically in a different computer and thus copying of files between those is not allowed.
## Desired Outcome
Rethink how immutability works in the computer and code nodes, what restrictions can be relaxed, which ones we want to keep to ensure a cleaner provenance, and what other options can we offer users to facilitate dealing with these immutability problems in the cases that need to remain.
## Impact
This will potentially improve usability for all users of AiiDA.
## Complexity
Since this is not a bug or missing feature, but a direct consequence of how we enforce the "respect" for the provenance and this clashing or resulting inconvenience in practical usage, the biggest issue here is to rethink a more general principle of AiiDA. Since this is coming from unforeseen practical inconvenience, it is worth it to first try to do a thorough compilation of user frustrations and also some thinking into other potential problems.
## Progress
The first task should be to do a comprehensive analysis of the situation and gather any remaining use case that may be relevant to consider.
Note that currently setting up a second Computer
with a different scheduler means that you can’t copy remote files between CalcJob
s, see
opened 09:23PM - 24 Sep 23 UTC
type/feature request
topic/engine
topic/computers
### Is your feature request related to a problem? Please describe
In some cas… es, I want to use two different schedulers on the same remote, e.g. `hyperqueue` for small jobs that I want to run on partial nodes, but `Slurm` for bigger jobs where I need multiple nodes and a solid chunk of walltime. Currently, this means I have to set up two computers with different schedulers. However, if one calculation needs to copy/symlink files from a previous one run on a different scheduler (i.e. computer), this currently fails with a `NotImplementedError` since the `execmanager` compares the computer UUIDs:
https://github.com/aiidateam/aiida-core/blob/8a2fece02411c982eb16e8fed8991ffaf75fa76f/aiida/engine/daemon/execmanager.py#L250-L272
### Describe the solution you'd like
One solution that I've been running with locally is to compare the `hostname` of the computers instead, which seemed sensible at first glance. There may be certain cases where this breaks, however?
### Describe alternatives you've considered
It's clear that a computer can be used with multiple schedulers. Besides the `hyperqueue` case, you might want to run e.g. an `aiida-shell` job directly on the login node. Instead of setting up multiple computers, maybe a computer can be configured with multiple schedulers with one the default and the others can be used by setting an option?
### Additional context
Related to https://github.com/aiidateam/aiida-core/issues/5084
system
Closed
May 15, 2024, 4:03am
4
This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.