Remote Computer Not Running sbatch

Hi all,

Running aiida on a remote HPC, but the jobs aren’t being submitted to the squeue (slurm) by sbatch. I imagine it is an issue on the remote computer side, but just in case I want to see if there is anything to be done on my end.

verdi computer test results in no issues, but submitting a job to Gaussian on the HPC results in (truncated verdi process report output):

+-> ERROR at 2024-06-11 11:39:44.169983-04:00
 | Traceback (most recent call last):
 |   File "/Users/chemlab/anaconda3/envs/aiida/lib/python3.12/site-packages/aiida/engine/utils.py", line 202, in exponential_backoff_retry
 |     result = await coro()
 |              ^^^^^^^^^^^^
 |   File "/Users/chemlab/anaconda3/envs/aiida/lib/python3.12/site-packages/aiida/engine/processes/calcjobs/tasks.py", line 145, in do_submit
 |     return execmanager.submit_calculation(node, transport)
 |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 |   File "/Users/chemlab/anaconda3/envs/aiida/lib/python3.12/site-packages/aiida/engine/daemon/execmanager.py", line 375, in submit_calculation
 |     result = scheduler.submit_from_script(workdir, submit_script_filename)
 |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 |   File "/Users/chemlab/anaconda3/envs/aiida/lib/python3.12/site-packages/aiida/schedulers/scheduler.py", line 409, in submit_from_script
 |     return self._parse_submit_output(*result)
 |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 |   File "/Users/chemlab/anaconda3/envs/aiida/lib/python3.12/site-packages/aiida/schedulers/plugins/slurm.py", line 488, in _parse_submit_output
 |     raise SchedulerError(
 | aiida.schedulers.scheduler.SchedulerError: Error during submission, could not retrieve the jobID from sbatch output; see log for more info.
+-> WARNING at 2024-06-11 11:39:44.188223-04:00
 | maximum attempts 5 of calling do_submit, exceeded

I’ve gone in and copied the inputs and generated .sh file elsewhere and manually submitted them with sbatch and that works with no issue.

Anyway, I think it’s their problem, but I would like to double check and see if there is anything I can try

Thanks

Hi, the problem is that when you submit, SLURM should print the job ID.
This is parsed from a string so the error is probably that the string is slightly different from what we expect (e.g. different locale/language, additional strings printed in the output, different customised version, …)
To understand what is going on, we need to see the log as the exception says: “; see log for more info”. If this run via a daemon, you need to check the daemon log, typically in your .aiida folder (in your home or where you set it). Inside you should have a folder daemon/log, and inside various files, you need to check the file aiida-<PROFILE_NAME>.log, where <PROFILE_NAME> is your AiiDA profile name (if you don’t know it, see the output of verdi profile list, is the one with the star symbol in front).

You don’t need to give us the whole file that might be big, but just look for related errors. They should be just above the traceback that you reported, that should also be in the log file. I think it will be prepended by the string in _parse_submit_output{}: unable to find the job id:

Hopefully looking at those lines will make it obvious what the problem is

1 Like

Thanks! I didn’t know where to find that log, so this is very helpful. Looking at the log I can confirm it is an issue with the remote computer. In the setup to bypass MFA for aiida, they restrict commands that can be used, and something in the command being run is being rejected. I’m reaching out to them for a solution.

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.