Hi,
I connected to the remote HPC and set up the code, but it seems AiiDA cannot use the remote slurm system. Here are my AiiDA settings on local computer and jobs on remote HPC(directly submit by slurm). Can anyone help to take a look what the problem is?
The computer:
(aiida_QE) yang@yang-Inspiron-3670:~/AiiDA/tutorial_ssh/02_by_web$ verdi computer show a0s000508
--------------------------- ---------------------------------------------------
Label a0s000508
PK 9
UUID f65e482a-dd76-4d4e-a85d-bd400caf6e0a
Description a0s000508 at paracloud, shared with CAO_Jie. 256 GB
Hostname ssh.cn-zhongwei-1.paracloud.com
Transport type core.ssh
Scheduler type core.slurm
Work directory /public4/home/a0s000508/AiiDA_local/AiiDA_local_01
Shebang #!/bin/bash
Mpirun command srun -n {tot_num_mpiprocs}
Default #procs/machine 64
Default memory (kB)/machine 268435456
Prepend text #SBATCH -p amd_256
#SBATCH -N 1
#SBATCH -n 64
Append text
--------------------------- ---------------------------------------------------
(aiida_QE) yang@yang-Inspiron-3670:~/AiiDA/tutorial_ssh/02_by_web$ verdi computer show configure a0s000508
Usage: verdi computer show [OPTIONS] COMPUTER
Try 'verdi computer show --help' for help.
Error: Invalid value for 'COMPUTER': no Computer found with LABEL<configure>: No result was found
(aiida_QE) yang@yang-Inspiron-3670:~/AiiDA/tutorial_ssh/02_by_web$ verdi computer configure show a0s000508
* username a0s000508@BSCC-A
* port 22
* look_for_keys True
* key_filename /home/yang/.ssh/id_ed25519.pub
* timeout 60
* allow_agent True
* proxy_jump
* proxy_command
* compress True
* gss_auth False
* gss_kex False
* gss_deleg_creds False
* gss_host ssh.cn-zhongwei-1.paracloud.com
* load_system_host_keys True
* key_policy WarningPolicy
* use_login_shell True
* safe_interval 30.0
(aiida_QE) yang@yang-Inspiron-3670:~/AiiDA/tutorial_ssh/02_by_web$ verdi computer test a0s000508
Report: Testing computer<a0s000508> for user<ygy11123@sina.com>...
* Opening connection... [OK]
* Checking for spurious output... [OK]
* Getting number of jobs from scheduler... [OK]: 0 jobs found in the queue
* Determining remote user name... [OK]: a0s000508
* Creating and deleting temporary file... [OK]
* Checking for possible delay from using login shell... [OK]
Success: all 6 tests succeeded
The code was set up by verdi code create core.code.installed --config code_pw.x_a0s000508.yml
---
label: 'qe-7.3-pw'
description: 'quantum_espresso v7.3 at a0s000508@BSCC-A, shared with CAO Jie'
default_calc_job_plugin: 'quantumespresso.pw'
filepath_executable: '/public4/home/a0s000508/software-a0s000508/qe-7.3/qe-install/bin/pw.x'
computer: 'a0s000508'
prepend_text: |
source /public4/soft/modules/module.sh
module load lapack/3.9.0-wxl-public4 fftw/3.3.8-mpi-public4 netcdf/4.4.1-parallel-icc17-ls-public4 libxc/4.3.4-icc17-ls-public4 szip/2.1.1-wzm-public4 blas/3.8.0-public4
export PATH=/public4/home/a0s000508/software-a0s000508/qe-7.3/qe-install/bin:$PATH
append_text: ' '
(aiida_QE) yang@yang-Inspiron-3670:~/AiiDA/tutorial_ssh/02_by_web$ verdi code show qe-7.3-pw@a0s000508
----------------------- -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
PK 833
UUID c787e0a6-ded1-4c21-844a-e2aabc67a56b
Type core.code.installed
Computer a0s000508 (ssh.cn-zhongwei-1.paracloud.com), pk: 9
Filepath executable /public4/home/a0s000508/software-a0s000508/qe-7.3/qe-install/bin/pw.x
Label qe-7.3-pw
Description quantum_espresso v7.3 at a0s000508@BSCC-A, shared with CAO Jie
Default calc job plugin quantumespresso.pw
Use double quotes False
With mpi
Prepend text source /public4/soft/modules/module.sh
module load lapack/3.9.0-wxl-public4 fftw/3.3.8-mpi-public4 netcdf/4.4.1-parallel-icc17-ls-public4 libxc/4.3.4-icc17-ls-public4 szip/2.1.1-wzm-public4 blas/3.8.0-public4
export PATH=/public4/home/a0s000508/software-a0s000508/qe-7.3/qe-install/bin:$PATH
Append text
----------------------- -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
(aiida_QE) yang@yang-Inspiron-3670:~/AiiDA/tutorial_ssh/02_by_web$
But the CLI cannot use the memory
(aiida_QE) yang@yang-Inspiron-3670:~/AiiDA/tutorial_ssh/02_by_web$ aiida-quantumespresso calculation launch pw -X qe-7.3-pw@a0s000508 -F SSSP/1.1/PBE/efficiency
Running a PwCalculation...
Error: Error in _parse_submit_output: retval=1; stdout=; stderr=sbatch: error: Batch job submission failed: Memory required by task is not available
Error: Exception whilst using transport:
Traceback (most recent call last):
File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/transports.py", line 110, in request_transport
yield transport_request.future
File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/processes/calcjobs/tasks.py", line 146, in do_submit
return execmanager.submit_calculation(node, transport)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/daemon/execmanager.py", line 382, in submit_calculation
result = scheduler.submit_from_script(workdir, submit_script_filename)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/scheduler.py", line 412, in submit_from_script
return self._parse_submit_output(*result)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/plugins/slurm.py", line 430, in _parse_submit_output
raise SchedulerError(f'Error during submission, retval={retval}\nstdout={stdout}\nstderr={stderr}')
aiida.schedulers.scheduler.SchedulerError: Error during submission, retval=1
stdout=
stderr=sbatch: error: Batch job submission failed: Memory required by task is not available
Error: iteration 1 of do_submit excepted, retrying after 20 seconds
Traceback (most recent call last):
File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/utils.py", line 209, in exponential_backoff_retry
result = await coro()
^^^^^^^^^^^^
File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/processes/calcjobs/tasks.py", line 146, in do_submit
return execmanager.submit_calculation(node, transport)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/daemon/execmanager.py", line 382, in submit_calculation
result = scheduler.submit_from_script(workdir, submit_script_filename)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/scheduler.py", line 412, in submit_from_script
return self._parse_submit_output(*result)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/plugins/slurm.py", line 430, in _parse_submit_output
raise SchedulerError(f'Error during submission, retval={retval}\nstdout={stdout}\nstderr={stderr}')
aiida.schedulers.scheduler.SchedulerError: Error during submission, retval=1
stdout=
stderr=sbatch: error: Batch job submission failed: Memory required by task is not available
Error: Error in _parse_submit_output: retval=1; stdout=; stderr=sbatch: error: Batch job submission failed: Memory required by task is not available
Error: Exception whilst using transport:
Traceback (most recent call last):
File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/transports.py", line 110, in request_transport
yield transport_request.future
File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/processes/calcjobs/tasks.py", line 146, in do_submit
return execmanager.submit_calculation(node, transport)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/daemon/execmanager.py", line 382, in submit_calculation
result = scheduler.submit_from_script(workdir, submit_script_filename)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/scheduler.py", line 412, in submit_from_script
return self._parse_submit_output(*result)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/plugins/slurm.py", line 430, in _parse_submit_output
raise SchedulerError(f'Error during submission, retval={retval}\nstdout={stdout}\nstderr={stderr}')
aiida.schedulers.scheduler.SchedulerError: Error during submission, retval=1
stdout=
stderr=sbatch: error: Batch job submission failed: Memory required by task is not available
Error: iteration 2 of do_submit excepted, retrying after 40 seconds
Traceback (most recent call last):
File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/utils.py", line 209, in exponential_backoff_retry
result = await coro()
^^^^^^^^^^^^
File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/processes/calcjobs/tasks.py", line 146, in do_submit
return execmanager.submit_calculation(node, transport)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/daemon/execmanager.py", line 382, in submit_calculation
result = scheduler.submit_from_script(workdir, submit_script_filename)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/scheduler.py", line 412, in submit_from_script
return self._parse_submit_output(*result)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/plugins/slurm.py", line 430, in _parse_submit_output
raise SchedulerError(f'Error during submission, retval={retval}\nstdout={stdout}\nstderr={stderr}')
aiida.schedulers.scheduler.SchedulerError: Error during submission, retval=1
stdout=
stderr=sbatch: error: Batch job submission failed: Memory required by task is not available
^CCritical: runner received interrupt, killing process 836
^X^CWarning: runner received interrupt, process 836 already being killed
^Z
[1]+ 已停止 aiida-quantumespresso calculation launch pw -X qe-7.3-pw@a0s000508 -F SSSP/1.1/PBE/efficiency
(aiida_QE) yang@yang-Inspiron-3670:~/AiiDA/tutorial_ssh/02_by_web$ ls
Here 256 means 256 GB, and I checked it means 268435456 KB.
I am not sure what the problem is.
Can anyone take a look. Many thanks for that!
Sincerely,
Dr. Guoyu Yang
Lecturer
Jimei Univ, School of Science, Digital Fujian Big Data Modeling and Intelligent Computing Institute
185 Yinjiang Rd.,
Jimei District, Xiamen,361021
Fujian, China
E-mail: 201961000100@jmu.edu.cn