Stderr=sbatch: error: Batch job submission failed: Memory required by task is not available

Hi,

I connected to the remote HPC and set up the code, but it seems AiiDA cannot use the remote slurm system. Here are my AiiDA settings on local computer and jobs on remote HPC(directly submit by slurm). Can anyone help to take a look what the problem is?

The computer:

(aiida_QE) yang@yang-Inspiron-3670:~/AiiDA/tutorial_ssh/02_by_web$ verdi computer show a0s000508
---------------------------  ---------------------------------------------------
Label                        a0s000508
PK                           9
UUID                         f65e482a-dd76-4d4e-a85d-bd400caf6e0a
Description                  a0s000508 at paracloud, shared with CAO_Jie. 256 GB
Hostname                     ssh.cn-zhongwei-1.paracloud.com
Transport type               core.ssh
Scheduler type               core.slurm
Work directory               /public4/home/a0s000508/AiiDA_local/AiiDA_local_01
Shebang                      #!/bin/bash
Mpirun command               srun -n {tot_num_mpiprocs}
Default #procs/machine       64
Default memory (kB)/machine  268435456
Prepend text                 #SBATCH -p amd_256
                             #SBATCH -N 1
                             #SBATCH -n 64
Append text
---------------------------  ---------------------------------------------------
(aiida_QE) yang@yang-Inspiron-3670:~/AiiDA/tutorial_ssh/02_by_web$ verdi computer show configure a0s000508
Usage: verdi computer show [OPTIONS] COMPUTER
Try 'verdi computer show --help' for help.

Error: Invalid value for 'COMPUTER': no Computer found with LABEL<configure>: No result was found
(aiida_QE) yang@yang-Inspiron-3670:~/AiiDA/tutorial_ssh/02_by_web$ verdi computer  configure show a0s000508
* username               a0s000508@BSCC-A
* port                   22
* look_for_keys          True
* key_filename           /home/yang/.ssh/id_ed25519.pub
* timeout                60
* allow_agent            True
* proxy_jump
* proxy_command
* compress               True
* gss_auth               False
* gss_kex                False
* gss_deleg_creds        False
* gss_host               ssh.cn-zhongwei-1.paracloud.com
* load_system_host_keys  True
* key_policy             WarningPolicy
* use_login_shell        True
* safe_interval          30.0
(aiida_QE) yang@yang-Inspiron-3670:~/AiiDA/tutorial_ssh/02_by_web$ verdi computer test a0s000508
Report: Testing computer<a0s000508> for user<ygy11123@sina.com>...
* Opening connection... [OK]
* Checking for spurious output... [OK]
* Getting number of jobs from scheduler... [OK]: 0 jobs found in the queue
* Determining remote user name... [OK]: a0s000508
* Creating and deleting temporary file... [OK]
* Checking for possible delay from using login shell... [OK]
Success: all 6 tests succeeded

The code was set up by verdi code create core.code.installed --config code_pw.x_a0s000508.yml

---
label: 'qe-7.3-pw'
description: 'quantum_espresso v7.3 at a0s000508@BSCC-A, shared with CAO Jie'
default_calc_job_plugin: 'quantumespresso.pw'
filepath_executable: '/public4/home/a0s000508/software-a0s000508/qe-7.3/qe-install/bin/pw.x'
computer: 'a0s000508'
prepend_text: |
  source /public4/soft/modules/module.sh
  module load lapack/3.9.0-wxl-public4 fftw/3.3.8-mpi-public4 netcdf/4.4.1-parallel-icc17-ls-public4 libxc/4.3.4-icc17-ls-public4   szip/2.1.1-wzm-public4  blas/3.8.0-public4
  export PATH=/public4/home/a0s000508/software-a0s000508/qe-7.3/qe-install/bin:$PATH

append_text: ' '


(aiida_QE) yang@yang-Inspiron-3670:~/AiiDA/tutorial_ssh/02_by_web$ verdi code show qe-7.3-pw@a0s000508  
-----------------------  -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
PK                       833
UUID                     c787e0a6-ded1-4c21-844a-e2aabc67a56b
Type                     core.code.installed
Computer                 a0s000508 (ssh.cn-zhongwei-1.paracloud.com), pk: 9
Filepath executable      /public4/home/a0s000508/software-a0s000508/qe-7.3/qe-install/bin/pw.x
Label                    qe-7.3-pw
Description              quantum_espresso v7.3 at a0s000508@BSCC-A, shared with CAO Jie
Default calc job plugin  quantumespresso.pw
Use double quotes        False
With mpi
Prepend text             source /public4/soft/modules/module.sh
                         module load lapack/3.9.0-wxl-public4 fftw/3.3.8-mpi-public4 netcdf/4.4.1-parallel-icc17-ls-public4 libxc/4.3.4-icc17-ls-public4   szip/2.1.1-wzm-public4  blas/3.8.0-public4
                         export PATH=/public4/home/a0s000508/software-a0s000508/qe-7.3/qe-install/bin:$PATH
Append text
-----------------------  -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
(aiida_QE) yang@yang-Inspiron-3670:~/AiiDA/tutorial_ssh/02_by_web$ 

But the CLI cannot use the memory

(aiida_QE) yang@yang-Inspiron-3670:~/AiiDA/tutorial_ssh/02_by_web$ aiida-quantumespresso calculation launch pw -X qe-7.3-pw@a0s000508 -F SSSP/1.1/PBE/efficiency
Running a PwCalculation...

Error: Error in _parse_submit_output: retval=1; stdout=; stderr=sbatch: error: Batch job submission failed: Memory required by task is not available

Error: Exception whilst using transport:
Traceback (most recent call last):
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/transports.py", line 110, in request_transport
    yield transport_request.future
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/processes/calcjobs/tasks.py", line 146, in do_submit
    return execmanager.submit_calculation(node, transport)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/daemon/execmanager.py", line 382, in submit_calculation
    result = scheduler.submit_from_script(workdir, submit_script_filename)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/scheduler.py", line 412, in submit_from_script
    return self._parse_submit_output(*result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/plugins/slurm.py", line 430, in _parse_submit_output
    raise SchedulerError(f'Error during submission, retval={retval}\nstdout={stdout}\nstderr={stderr}')
aiida.schedulers.scheduler.SchedulerError: Error during submission, retval=1
stdout=
stderr=sbatch: error: Batch job submission failed: Memory required by task is not available


Error: iteration 1 of do_submit excepted, retrying after 20 seconds
Traceback (most recent call last):
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/utils.py", line 209, in exponential_backoff_retry
    result = await coro()
             ^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/processes/calcjobs/tasks.py", line 146, in do_submit
    return execmanager.submit_calculation(node, transport)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/daemon/execmanager.py", line 382, in submit_calculation
    result = scheduler.submit_from_script(workdir, submit_script_filename)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/scheduler.py", line 412, in submit_from_script
    return self._parse_submit_output(*result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/plugins/slurm.py", line 430, in _parse_submit_output
    raise SchedulerError(f'Error during submission, retval={retval}\nstdout={stdout}\nstderr={stderr}')
aiida.schedulers.scheduler.SchedulerError: Error during submission, retval=1
stdout=
stderr=sbatch: error: Batch job submission failed: Memory required by task is not available




Error: Error in _parse_submit_output: retval=1; stdout=; stderr=sbatch: error: Batch job submission failed: Memory required by task is not available

Error: Exception whilst using transport:
Traceback (most recent call last):
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/transports.py", line 110, in request_transport
    yield transport_request.future
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/processes/calcjobs/tasks.py", line 146, in do_submit
    return execmanager.submit_calculation(node, transport)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/daemon/execmanager.py", line 382, in submit_calculation
    result = scheduler.submit_from_script(workdir, submit_script_filename)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/scheduler.py", line 412, in submit_from_script
    return self._parse_submit_output(*result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/plugins/slurm.py", line 430, in _parse_submit_output
    raise SchedulerError(f'Error during submission, retval={retval}\nstdout={stdout}\nstderr={stderr}')
aiida.schedulers.scheduler.SchedulerError: Error during submission, retval=1
stdout=
stderr=sbatch: error: Batch job submission failed: Memory required by task is not available


Error: iteration 2 of do_submit excepted, retrying after 40 seconds
Traceback (most recent call last):
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/utils.py", line 209, in exponential_backoff_retry
    result = await coro()
             ^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/processes/calcjobs/tasks.py", line 146, in do_submit
    return execmanager.submit_calculation(node, transport)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/daemon/execmanager.py", line 382, in submit_calculation
    result = scheduler.submit_from_script(workdir, submit_script_filename)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/scheduler.py", line 412, in submit_from_script
    return self._parse_submit_output(*result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/plugins/slurm.py", line 430, in _parse_submit_output
    raise SchedulerError(f'Error during submission, retval={retval}\nstdout={stdout}\nstderr={stderr}')
aiida.schedulers.scheduler.SchedulerError: Error during submission, retval=1
stdout=
stderr=sbatch: error: Batch job submission failed: Memory required by task is not available

^CCritical: runner received interrupt, killing process 836

^X^CWarning: runner received interrupt, process 836 already being killed


^Z
[1]+  已停止               aiida-quantumespresso calculation launch pw -X qe-7.3-pw@a0s000508 -F SSSP/1.1/PBE/efficiency
(aiida_QE) yang@yang-Inspiron-3670:~/AiiDA/tutorial_ssh/02_by_web$ ls

Here 256 means 256 GB, and I checked it means 268435456 KB.

I am not sure what the problem is.
Can anyone take a look. Many thanks for that!

Sincerely,
Dr. Guoyu Yang
Lecturer
Jimei Univ, School of Science, Digital Fujian Big Data Modeling and Intelligent Computing Institute
185 Yinjiang Rd.,
Jimei District, Xiamen,361021
Fujian, China
E-mail: 201961000100@jmu.edu.cn

If I run it on the Jupyter lab, it cannot use the memory either

+*In[1]:*+
[source, ipython3]
----
from aiida import load_profile

load_profile()
----


+*Out[1]:*+
----Profile<uuid='c633e7bf80b9436ba2a73f137476acfe' name='quicksetup'>----


+*In[2]:*+
[source, ipython3]
----
from aiida.engine import run
from aiida.orm import Dict, KpointsData, StructureData, load_code, load_group
----


+*In[3]:*+
[source, ipython3]
----
# Load the code configured for ``pw.x``. Make sure to replace this string
# with the label that you used in the setup of the code.
#code = load_code('pw@localhost')
code = load_code('qe-7.3-pw@a0s000508')
builder = code.get_builder()
----


+*In[4]:*+
[source, ipython3]
----
# Create a silicon fcc crystal
from ase.build import bulk
structure = StructureData(ase=bulk('Si', 'fcc', 5.43))
builder.structure = structure
----


+*In[5]:*+
[source, ipython3]
----
# Load the pseudopotential family.
pseudo_family = load_group('SSSP/1.3/PBE/efficiency')
builder.pseudos = pseudo_family.get_pseudos(structure=structure)
----


+*In[6]:*+
[source, ipython3]
----
# Request the recommended wavefunction and charge density cutoffs
# for the given structure and energy units.
cutoff_wfc, cutoff_rho = pseudo_family.get_recommended_cutoffs(
    structure=structure,
    unit='Ry'
)

parameters = Dict({
    'CONTROL': {
        'calculation': 'scf'
    },
    'SYSTEM': {
        'ecutwfc': cutoff_wfc,
        'ecutrho': cutoff_rho,
    }
})
builder.parameters = parameters
----


+*In[7]:*+
[source, ipython3]
----
# Generate a 2x2x2 Monkhorst-Pack mesh
kpoints = KpointsData()
kpoints.set_kpoints_mesh([2, 2, 2])
builder.kpoints = kpoints
----


+*In[8]:*+
[source, ipython3]
----
# Run the calculation on 1 CPU and kill it if it runs longer than 1800 seconds.
# Set ``withmpi`` to ``False`` if ``pw.x`` was compiled without MPI support.
builder.metadata.options = {
    'resources': {
        'num_machines': 1,
    },
    'max_wallclock_seconds': 1800,
    'withmpi': False,
}
----


+*In[ ]:*+
[source, ipython3]
----
results, node = run.get_node(builder)
----


+*Out[ ]:*+
----
02/15/2024 05:35:07 PM <335286> aiida.scheduler.slurm: [ERROR] Error in _parse_submit_output: retval=1; stdout=; stderr=sbatch: error: Batch job submission failed: Memory required by task is not available

02/15/2024 05:35:07 PM <335286> aiida.engine.transports: [ERROR] Exception whilst using transport:
Traceback (most recent call last):
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/transports.py", line 110, in request_transport
    yield transport_request.future
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/processes/calcjobs/tasks.py", line 146, in do_submit
    return execmanager.submit_calculation(node, transport)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/daemon/execmanager.py", line 382, in submit_calculation
    result = scheduler.submit_from_script(workdir, submit_script_filename)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/scheduler.py", line 412, in submit_from_script
    return self._parse_submit_output(*result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/plugins/slurm.py", line 430, in _parse_submit_output
    raise SchedulerError(f'Error during submission, retval={retval}\nstdout={stdout}\nstderr={stderr}')
aiida.schedulers.scheduler.SchedulerError: Error during submission, retval=1
stdout=
stderr=sbatch: error: Batch job submission failed: Memory required by task is not available


02/15/2024 05:35:07 PM <335286> aiida.orm.nodes.process.calculation.calcjob.CalcJobNode: [ERROR] iteration 1 of do_submit excepted, retrying after 20 seconds
Traceback (most recent call last):
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/utils.py", line 209, in exponential_backoff_retry
    result = await coro()
             ^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/processes/calcjobs/tasks.py", line 146, in do_submit
    return execmanager.submit_calculation(node, transport)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/daemon/execmanager.py", line 382, in submit_calculation
    result = scheduler.submit_from_script(workdir, submit_script_filename)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/scheduler.py", line 412, in submit_from_script
    return self._parse_submit_output(*result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/plugins/slurm.py", line 430, in _parse_submit_output
    raise SchedulerError(f'Error during submission, retval={retval}\nstdout={stdout}\nstderr={stderr}')
aiida.schedulers.scheduler.SchedulerError: Error during submission, retval=1
stdout=
stderr=sbatch: error: Batch job submission failed: Memory required by task is not available

02/15/2024 05:36:00 PM <335286> aiida.scheduler.slurm: [ERROR] Error in _parse_submit_output: retval=1; stdout=; stderr=sbatch: error: Batch job submission failed: Memory required by task is not available

02/15/2024 05:36:00 PM <335286> aiida.engine.transports: [ERROR] Exception whilst using transport:
Traceback (most recent call last):
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/transports.py", line 110, in request_transport
    yield transport_request.future
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/processes/calcjobs/tasks.py", line 146, in do_submit
    return execmanager.submit_calculation(node, transport)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/daemon/execmanager.py", line 382, in submit_calculation
    result = scheduler.submit_from_script(workdir, submit_script_filename)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/scheduler.py", line 412, in submit_from_script
    return self._parse_submit_output(*result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/plugins/slurm.py", line 430, in _parse_submit_output
    raise SchedulerError(f'Error during submission, retval={retval}\nstdout={stdout}\nstderr={stderr}')
aiida.schedulers.scheduler.SchedulerError: Error during submission, retval=1
stdout=
stderr=sbatch: error: Batch job submission failed: Memory required by task is not available


02/15/2024 05:36:00 PM <335286> aiida.orm.nodes.process.calculation.calcjob.CalcJobNode: [ERROR] iteration 2 of do_submit excepted, retrying after 40 seconds
Traceback (most recent call last):
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/utils.py", line 209, in exponential_backoff_retry
    result = await coro()
             ^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/processes/calcjobs/tasks.py", line 146, in do_submit
    return execmanager.submit_calculation(node, transport)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/daemon/execmanager.py", line 382, in submit_calculation
    result = scheduler.submit_from_script(workdir, submit_script_filename)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/scheduler.py", line 412, in submit_from_script
    return self._parse_submit_output(*result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/plugins/slurm.py", line 430, in _parse_submit_output
    raise SchedulerError(f'Error during submission, retval={retval}\nstdout={stdout}\nstderr={stderr}')
aiida.schedulers.scheduler.SchedulerError: Error during submission, retval=1
stdout=
stderr=sbatch: error: Batch job submission failed: Memory required by task is not available

02/15/2024 05:37:41 PM <335286> aiida.scheduler.slurm: [ERROR] Error in _parse_submit_output: retval=1; stdout=; stderr=sbatch: error: Batch job submission failed: Socket timed out on send/recv operation

02/15/2024 05:37:41 PM <335286> aiida.engine.transports: [ERROR] Exception whilst using transport:
Traceback (most recent call last):
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/transports.py", line 110, in request_transport
    yield transport_request.future
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/processes/calcjobs/tasks.py", line 146, in do_submit
    return execmanager.submit_calculation(node, transport)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/daemon/execmanager.py", line 382, in submit_calculation
    result = scheduler.submit_from_script(workdir, submit_script_filename)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/scheduler.py", line 412, in submit_from_script
    return self._parse_submit_output(*result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/plugins/slurm.py", line 430, in _parse_submit_output
    raise SchedulerError(f'Error during submission, retval={retval}\nstdout={stdout}\nstderr={stderr}')
aiida.schedulers.scheduler.SchedulerError: Error during submission, retval=1
stdout=
stderr=sbatch: error: Batch job submission failed: Socket timed out on send/recv operation


02/15/2024 05:37:41 PM <335286> aiida.orm.nodes.process.calculation.calcjob.CalcJobNode: [ERROR] iteration 3 of do_submit excepted, retrying after 80 seconds
Traceback (most recent call last):
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/utils.py", line 209, in exponential_backoff_retry
    result = await coro()
             ^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/processes/calcjobs/tasks.py", line 146, in do_submit
    return execmanager.submit_calculation(node, transport)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/daemon/execmanager.py", line 382, in submit_calculation
    result = scheduler.submit_from_script(workdir, submit_script_filename)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/scheduler.py", line 412, in submit_from_script
    return self._parse_submit_output(*result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/plugins/slurm.py", line 430, in _parse_submit_output
    raise SchedulerError(f'Error during submission, retval={retval}\nstdout={stdout}\nstderr={stderr}')
aiida.schedulers.scheduler.SchedulerError: Error during submission, retval=1
stdout=
stderr=sbatch: error: Batch job submission failed: Socket timed out on send/recv operation

02/15/2024 05:39:34 PM <335286> aiida.scheduler.slurm: [ERROR] Error in _parse_submit_output: retval=1; stdout=; stderr=sbatch: error: Batch job submission failed: Memory required by task is not available

02/15/2024 05:39:34 PM <335286> aiida.engine.transports: [ERROR] Exception whilst using transport:
Traceback (most recent call last):
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/transports.py", line 110, in request_transport
    yield transport_request.future
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/processes/calcjobs/tasks.py", line 146, in do_submit
    return execmanager.submit_calculation(node, transport)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/daemon/execmanager.py", line 382, in submit_calculation
    result = scheduler.submit_from_script(workdir, submit_script_filename)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/scheduler.py", line 412, in submit_from_script
    return self._parse_submit_output(*result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/plugins/slurm.py", line 430, in _parse_submit_output
    raise SchedulerError(f'Error during submission, retval={retval}\nstdout={stdout}\nstderr={stderr}')
aiida.schedulers.scheduler.SchedulerError: Error during submission, retval=1
stdout=
stderr=sbatch: error: Batch job submission failed: Memory required by task is not available


02/15/2024 05:39:34 PM <335286> aiida.orm.nodes.process.calculation.calcjob.CalcJobNode: [ERROR] iteration 4 of do_submit excepted, retrying after 160 seconds
Traceback (most recent call last):
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/utils.py", line 209, in exponential_backoff_retry
    result = await coro()
             ^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/processes/calcjobs/tasks.py", line 146, in do_submit
    return execmanager.submit_calculation(node, transport)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/daemon/execmanager.py", line 382, in submit_calculation
    result = scheduler.submit_from_script(workdir, submit_script_filename)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/scheduler.py", line 412, in submit_from_script
    return self._parse_submit_output(*result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/plugins/slurm.py", line 430, in _parse_submit_output
    raise SchedulerError(f'Error during submission, retval={retval}\nstdout={stdout}\nstderr={stderr}')
aiida.schedulers.scheduler.SchedulerError: Error during submission, retval=1
stdout=
stderr=sbatch: error: Batch job submission failed: Memory required by task is not available

02/15/2024 05:42:45 PM <335286> aiida.scheduler.slurm: [ERROR] Error in _parse_submit_output: retval=1; stdout=; stderr=sbatch: error: Batch job submission failed: Memory required by task is not available

02/15/2024 05:42:45 PM <335286> aiida.engine.transports: [ERROR] Exception whilst using transport:
Traceback (most recent call last):
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/transports.py", line 110, in request_transport
    yield transport_request.future
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/processes/calcjobs/tasks.py", line 146, in do_submit
    return execmanager.submit_calculation(node, transport)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/daemon/execmanager.py", line 382, in submit_calculation
    result = scheduler.submit_from_script(workdir, submit_script_filename)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/scheduler.py", line 412, in submit_from_script
    return self._parse_submit_output(*result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/plugins/slurm.py", line 430, in _parse_submit_output
    raise SchedulerError(f'Error during submission, retval={retval}\nstdout={stdout}\nstderr={stderr}')
aiida.schedulers.scheduler.SchedulerError: Error during submission, retval=1
stdout=
stderr=sbatch: error: Batch job submission failed: Memory required by task is not available


02/15/2024 05:42:45 PM <335286> aiida.orm.nodes.process.calculation.calcjob.CalcJobNode: [ERROR] iteration 5 of do_submit excepted
Traceback (most recent call last):
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/utils.py", line 209, in exponential_backoff_retry
    result = await coro()
             ^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/processes/calcjobs/tasks.py", line 146, in do_submit
    return execmanager.submit_calculation(node, transport)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/engine/daemon/execmanager.py", line 382, in submit_calculation
    result = scheduler.submit_from_script(workdir, submit_script_filename)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/scheduler.py", line 412, in submit_from_script
    return self._parse_submit_output(*result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yang/miniforge3/envs/aiida_QE/lib/python3.11/site-packages/aiida/schedulers/plugins/slurm.py", line 430, in _parse_submit_output
    raise SchedulerError(f'Error during submission, retval={retval}\nstdout={stdout}\nstderr={stderr}')
aiida.schedulers.scheduler.SchedulerError: Error during submission, retval=1
stdout=
stderr=sbatch: error: Batch job submission failed: Memory required by task is not available

02/15/2024 05:42:45 PM <335286> aiida.orm.nodes.process.calculation.calcjob.CalcJobNode: [WARNING] maximum attempts 5 of calling do_submit, exceeded
02/15/2024 05:42:45 PM <335286> aiida.engine.processes.calcjobs.tasks: [WARNING] submitting CalcJob<841> failed
----


+*In[ ]:*+
[source, ipython3]
----
node.exit_status
----


+*In[ ]:*+
[source, ipython3]
----
print(results)
{
    'output_band': <BandsData: uuid: a82526b7-fb7f-4638-a1e1-f72ad04a5f13 (pk: 59537)>,
    'output_trajectory': <TrajectoryData: uuid: 4122ba08-6318-4029-9892-55b29e89a39c (pk: 59538)>,
    'output_parameters': <Dict: uuid: 82fe2b57-0dc0-4031-81a0-e80ed34db680 (pk: 59539)>,
    'remote_folder': <RemoteData: uuid: 672113fd-7177-4688-ad77-674703b8f611 (pk: 59535)>,
    'retrieved': <FolderData: uuid: 5f44b880-4c74-4194-9484-0c51ec4f1a34 (pk: 59536)>
}
----


+*In[ ]:*+
[source, ipython3]
----

----

I cannot find the job with command squeue on the HPC.

But the HPC is working with the submit command sbatch

#!/bin/bash
#SBATCH -p amd_256
#SBATCH -N 1
#SBATCH -n 64
source /public4/soft/modules/module.sh
module load lapack/3.9.0-wxl-public4 fftw/3.3.8-mpi-public4 netcdf/4.4.1-parallel-icc17-ls-public4 libxc/4.3.4-icc17-ls-public4  szip/2.1.1-wzm-public4  blas/3.8.0-public4
export PATH=/public4/home/a0s000508/software-a0s000508/qe-7.3/qe-install/bin:$PATH
srun -n 64 pw.x  < scf.in > scf.out
~                                          

Sincerely,
Dr. Guoyu Yang
Lecturer
Jimei Univ, School of Science, Digital Fujian Big Data Modeling and Intelligent Computing Institute
185 Yinjiang Rd.,
Jimei District, Xiamen,361021
Fujian, China
E-mail: 201961000100@jmu.edu.cn

To use the tutorial, I have already installed several pseudo-potential on my local computer.

$ aiida-pseudo list
Label                                Type string                Count
-----------------------------------  -------------------------  -------
PseudoDojo/0.4/PBE/SR/standard/psp8  pseudo.family.pseudo_dojo  72
SSSP/1.3/PBEsol/efficiency           pseudo.family.sssp         103
SSSP/1.3/PBE/efficiency              pseudo.family.sssp         103
SSSP/1.2/PBE/efficiency              pseudo.family.sssp         85
SSSP/1.1/PBE/efficiency              pseudo.family.sssp         85
SSSP/1.2/PBEsol/efficiency           pseudo.family.sssp         85

Sincerely,
Dr. Guoyu Yang
Lecturer
Jimei Univ, School of Science, Digital Fujian Big Data Modeling and Intelligent Computing Institute
185 Yinjiang Rd.,
Jimei District, Xiamen,361021
Fujian, China
E-mail: 201961000100@jmu.edu.cn

Hi @AmberLEE123456

Could you try to check two things?

  1. What is the version of the slurm running in the cluster?
  2. Can you go to the remote folder of the calcuation (by running verdi calcjob gotocomputer <pwcalculation_calcjob_pk>) and show the bash script content generated by AiiDA (The filename of the batch script might be _aiidasubmit.sh).

Thanks.

  1. slurm 22.05.11
[a0s000508@ln27%bscc-a slurm]$  sbatch -V 
slurm 22.05.11
  1. Yes, and the files seem quite OK…
(aiida_QE) yang@yang-Inspiron-3670:~/AiiDA/tutorial_ssh/02_by_web$ verdi calcjob gotocomputer 841
Report: going to the remote work directory...
+---------------------------------------------------------------------------------------------------------+
| 温馨提示:                                                                                              |
|    登录节点只能用来进行编辑编译工作,不能运行作业,管理员发现用户在登录节点提交程序,会杀掉相应作业,敬 |
| 请遵守规定!超算是独占节点形式,不满核提交将按照满核收费,建议尽量满核提交作业!                        |
|    amd_16core队列只支持单节点16核的作业,不支持其他核数,不支持跨节点。需要时请联系客服咨询脚本模板。   |
|    amd_test是调试队列,单节点64核256G内存,最多可以运行30分钟,计费标准与amd_256队列一致。              |
|                                                                                        超算云服务团队   |
|                                                                                        2020年5月28日    |
|                                                                                                         |
|                                                                                                         |
|                                                                                                         |
|   各位老师好,  为了提高用户使用体验,保障操作流畅性,10月9日起将取消A区public3上的module环境,登陆时不会|
| 自动加载,若有module需求:                                                                               |
|   public1用户可执行source /public1/soft/modules/module.sh                                               |
|   public3用户可执行source /public3/soft/modules/module.sh                                               |
|   public4用户可执行 source /public4/soft/modules/module.sh                                              |
|                                                                                                         |
|   public9用户可执行source /public9/soft/module.sh                                                       |
|  确认所使用的存储的方法:执行  pwd  即可查看。                                                          |
|                                                                                        超算云服务团队   |
|                                                                                        2020年10月9日    |
    3月14日起,BSCC-A超算sacct单次查询历史作业详单的最长时间区间调整为30天,查询命令如下:
  sacct -D -T -X -u sc****  -S 2022-01-01T00:00:00 -E 2022-01-31T00:00:00 
											 超算云服务团队
											 2022年3月8日

	
|                                                                                                         |
|                                                                                                         |
+---------------------------------------------------------------------------------------------------------+
[a0s000508@ln25%bscc-a 53ea-16e2-4d19-afae-55cabfa51645]$ pwd
/public4/home/a0s000508/AiiDA_local/AiiDA_local_01/e2/6b/53ea-16e2-4d19-afae-55cabfa51645
[a0s000508@ln25%bscc-a 53ea-16e2-4d19-afae-55cabfa51645]$ ls
aiida.in  _aiidasubmit.sh  out  pseudo
[a0s000508@ln25%bscc-a 53ea-16e2-4d19-afae-55cabfa51645]$ more _aiidasubmit.sh 
#!/bin/bash
#SBATCH --no-requeue
#SBATCH --job-name="aiida-841"
#SBATCH --get-user-env
#SBATCH --output=_scheduler-stdout.txt
#SBATCH --error=_scheduler-stderr.txt
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=64
#SBATCH --time=00:30:00
#SBATCH --mem=262144

#SBATCH -p amd_256
#SBATCH -N 1
#SBATCH -n 64

source /public4/soft/modules/module.sh
module load lapack/3.9.0-wxl-public4 fftw/3.3.8-mpi-public4 netcdf/4.4.1-parallel-icc17-ls-public4 libxc/4.3.4-i
cc17-ls-public4   szip/2.1.1-wzm-public4  blas/3.8.0-public4 
export PATH=/public4/home/a0s000508/software-a0s000508/qe-7.3/qe-install/bin:$PATH


'/public4/home/a0s000508/software-a0s000508/qe-7.3/qe-install/bin/pw.x' '-in' 'aiida.in'  > "aiida.out"

 
[a0s000508@ln25%bscc-a 53ea-16e2-4d19-afae-55cabfa51645]$ /public4/home/a0s000508/software-a0s000508/qe-7.3/qe-install/bin/pw.x
/public4/home/a0s000508/software-a0s000508/qe-7.3/qe-install/bin/pw.x: error while loading shared libraries: libmkl_scalapack_lp64.so: cannot open shared object file: No such file or directory
[a0s000508@ln25%bscc-a 53ea-16e2-4d19-afae-55cabfa51645]$ pwd
/public4/home/a0s000508/AiiDA_local/AiiDA_local_01/e2/6b/53ea-16e2-4d19-afae-55cabfa51645
[a0s000508@ln25%bscc-a 53ea-16e2-4d19-afae-55cabfa51645]$ ls /public4/home/a0s000508/software-a0s000508/qe-7.3/qe-install/
bin/   test/  test2/ 
[a0s000508@ln25%bscc-a 53ea-16e2-4d19-afae-55cabfa51645]$ ls /public4/home/a0s000508/software-a0s000508/qe-7.3/qe-install/
bin/   test/  test2/ 
[a0s000508@ln25%bscc-a 53ea-16e2-4d19-afae-55cabfa51645]$ ls /public4/home/a0s000508/software-a0s000508/qe-7.3/qe-install/bin/
alpha2f.x             fermi_velocity.x      molecularnexafs.x     pprism.x              simple_ip.x
average.x             fqha.x                molecularpdos.x       pp.x                  simple.x
band_interpolation.x  fs.x                  neb.x                 projwfc.x             spectra_correction.x
bands.x               gww_fit.x             open_grid.x           pw2bgw.x              sumpdos.x
bse_main.x            gww.x                 oscdft_et.x           pw2critic.x           turbo_davidson.x
casino2upf.x          head.x                oscdft_pp.x           pw2gt.x               turbo_eels.x
cell2ibrav.x          hp.x                  path_interpolation.x  pw2gw.x               turbo_lanczos.x
cppp.x                ibrav2cell.x          pawplot.x             pw2wannier90.x        turbo_magnon.x
cp.x                  initial_state.x       phcg.x                pw4gww.x              turbo_spectrum.x
d3hess.x              kcwpp_interp.x        ph.x                  pwcond.x              upfconv.x
dos.x                 kcwpp_sh.x            plan_avg.x            pwi2xsf.x             virtual_v2.x
dvscf_q2r.x           kcw.x                 plotband.x            pw.x                  wannier_ham.x
dynmat.x              kpoints.x             plotproj.x            q2qstar.x             wannier_plot.x
epa.x                 lambda.x              plotrho.x             q2r.x                 wfck2r.x
epsilon.x             ld1.x                 pmw.x                 rism1d.x              wfdd.x
ev.x                  manycp.x              postahc.x             scan_ibrav.x          xspectra.x
fermi_proj.x          matdyn.x              ppacf.x               simple_bse.x          
[a0s000508@ln25%bscc-a 53ea-16e2-4d19-afae-55cabfa51645]$ ls /public4/home/a0s000508/software-a0s000508/qe-7.3/qe-install/bin/pw.x 
/public4/home/a0s000508/software-a0s000508/qe-7.3/qe-install/bin/pw.x
[a0s000508@ln25%bscc-a 53ea-16e2-4d19-afae-55cabfa51645]$ 

[a0s000508@ln25%bscc-a 53ea-16e2-4d19-afae-55cabfa51645]$ more aiida.in 
&CONTROL
  calculation = 'scf'
  outdir = './out/'
  prefix = 'aiida'
  pseudo_dir = './pseudo/'
  verbosity = 'high'
/
&SYSTEM
  ecutrho =   2.4000000000d+02
  ecutwfc =   3.0000000000d+01
  ibrav = 0
  nat = 1
  ntyp = 1
/
&ELECTRONS
/
ATOMIC_SPECIES
Si     28.085 Si.pbe-n-rrkjus_psl.1.0.0.UPF
ATOMIC_POSITIONS angstrom
Si           0.0000000000       0.0000000000       0.0000000000
K_POINTS automatic
2 2 2 0 0 0
CELL_PARAMETERS angstrom
      0.0000000000       2.7150000000       2.7150000000
      2.7150000000       0.0000000000       2.7150000000
      2.7150000000       2.7150000000       0.0000000000
[a0s000508@ln25%bscc-a 53ea-16e2-4d19-afae-55cabfa51645]$ 

Sincerely,
Dr. Guoyu Yang
Lecturer
Jimei Univ, School of Science, Digital Fujian Big Data Modeling and Intelligent Computing Institute
185 Yinjiang Rd.,
Jimei District, Xiamen,361021
Fujian, China
E-mail: 201961000100@jmu.edu.cn

Thanks for providing more details.
If you run to submit the job with slurm command from this folder, is the job successfully running or you will encounter the same error?

If I submit the job with sbatch _aiidasubmit.sh , it reports the same error

[a0s000508@ln23%bscc-a 53ea-16e2-4d19-afae-55cabfa51645]$ sbatch _aiidasubmit.sh 
sbatch: error: Batch job submission failed: Memory required by task is not available
[a0s000508@ln23%bscc-a 53ea-16e2-4d19-afae-55cabfa51645]$ cp _aiidasubmit.sh test.sh

But if I remove the line #SBATCH --mem=262144 , and resubmit the job, the job can finish normally.

new subscript

[a0s000508@ln23%bscc-a 53ea-16e2-4d19-afae-55cabfa51645]$ more _aiidasubmit.sh 
#!/bin/bash
#SBATCH --no-requeue
#SBATCH --job-name="aiida-841"
#SBATCH --get-user-env
#SBATCH --output=_scheduler-stdout.txt
#SBATCH --error=_scheduler-stderr.txt
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=64
#SBATCH --time=00:30:00

#SBATCH -p amd_256
#SBATCH -N 1
#SBATCH -n 64

source /public4/soft/modules/module.sh
module load lapack/3.9.0-wxl-public4 fftw/3.3.8-mpi-public4 netcdf/4.4.1-parallel-icc17-ls-public4 libxc/4.3.4-i
cc17-ls-public4   szip/2.1.1-wzm-public4  blas/3.8.0-public4 
export PATH=/public4/home/a0s000508/software-a0s000508/qe-7.3/qe-install/bin:$PATH


'/public4/home/a0s000508/software-a0s000508/qe-7.3/qe-install/bin/pw.x' '-in' 'aiida.in'  > "aiida.out"

 
[a0s000508@ln23%bscc-a 53ea-16e2-4d19-afae-55cabfa51645]$ 

sbatch line

[a0s000508@ln23%bscc-a 53ea-16e2-4d19-afae-55cabfa51645]$ vi _aiidasubmit.sh 
[a0s000508@ln23%bscc-a 53ea-16e2-4d19-afae-55cabfa51645]$ sbatch _aiidasubmit.sh 
Submitted batch job 8005366
[a0s000508@ln23%bscc-a 53ea-16e2-4d19-afae-55cabfa51645]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           8005366   amd_256 aiida-84 a0s00050  R    INVALID      1 j2005

results

[a0s000508@ln23%bscc-a 53ea-16e2-4d19-afae-55cabfa51645]$ more aiida.out 

     Program PWSCF v.7.3 starts on 17Feb2024 at  8:51:10 

     This program is part of the open-source Quantum ESPRESSO suite
     for quantum simulation of materials; please cite
         "P. Giannozzi et al., J. Phys.:Condens. Matter 21 395502 (2009);
         "P. Giannozzi et al., J. Phys.:Condens. Matter 29 465901 (2017);
         "P. Giannozzi et al., J. Chem. Phys. 152 154105 (2020);
          URL http://www.quantum-espresso.org", 
     in publications or presentations arising from this work. More details at
     http://www.quantum-espresso.org/quote

     Parallel version (MPI), running on     1 processors

     MPI processes distributed on     1 nodes
     240030 MiB available memory on the printing compute node when the environment starts
 
     Reading input from aiida.in

     Current dimensions of program PWSCF are:
     Max number of different atomic species (ntypx) = 10
     Max number of k-points (npk) =  40000
     Max angular momentum in pseudopotentials (lmaxx) =  4
     Message from routine setup:

     Called by h_psi:
     h_psi:calbec :      0.00s CPU      0.00s WALL (      54 calls)
     vloc_psi     :      0.02s CPU      0.03s WALL (      54 calls)
     add_vuspsi   :      0.00s CPU      0.00s WALL (      54 calls)

     General routines
     calbec       :      0.00s CPU      0.00s WALL (      72 calls)
     fft          :      0.02s CPU      0.23s WALL (      77 calls)
     ffts         :      0.00s CPU      0.01s WALL (      12 calls)
     fftw         :      0.02s CPU      0.03s WALL (     252 calls)
     interpolate  :      0.00s CPU      0.01s WALL (       6 calls)

     Parallel routines

     PWSCF        :      0.75s CPU      1.86s WALL


   This run was terminated on:   8:51:12  17Feb2024

=------------------------------------------------------------------------------=
   JOB DONE.
=------------------------------------------------------------------------------=
"aiida.out" 881L, 33015C            

Maybe we just get rid of the memory line? (And how… that was set on the verdi computer quicksetup)

Sincerely,
Dr. Guoyu Yang
Lecturer
Jimei Univ, School of Science, Digital Fujian Big Data Modeling and Intelligent Computing Institute
185 Yinjiang Rd.,
Jimei District, Xiamen,361021
Fujian, China
E-mail: 201961000100@jmu.edu.cn

I decrease the default memory to 240000000, then things start to work…

jupyter notebook

+*In[1]:*+
[source, ipython3]
----
from aiida import load_profile

load_profile()
----


+*Out[1]:*+
----Profile<uuid='c633e7bf80b9436ba2a73f137476acfe' name='quicksetup'>----


+*In[2]:*+
[source, ipython3]
----
from aiida.engine import run
from aiida.orm import Dict, KpointsData, StructureData, load_code, load_group
----


+*In[3]:*+
[source, ipython3]
----
# Load the code configured for ``pw.x``. Make sure to replace this string
# with the label that you used in the setup of the code.
#code = load_code('pw@localhost')
code = load_code('qe-7.3-pw@a0s000508_02')
builder = code.get_builder()
----


+*In[4]:*+
[source, ipython3]
----
# Create a silicon fcc crystal
from ase.build import bulk
structure = StructureData(ase=bulk('Si', 'fcc', 5.43))
builder.structure = structure
----


+*In[5]:*+
[source, ipython3]
----
# Load the pseudopotential family.
pseudo_family = load_group('SSSP/1.3/PBE/efficiency')
builder.pseudos = pseudo_family.get_pseudos(structure=structure)
----


+*In[6]:*+
[source, ipython3]
----
# Request the recommended wavefunction and charge density cutoffs
# for the given structure and energy units.
cutoff_wfc, cutoff_rho = pseudo_family.get_recommended_cutoffs(
    structure=structure,
    unit='Ry'
)

parameters = Dict({
    'CONTROL': {
        'calculation': 'scf'
    },
    'SYSTEM': {
        'ecutwfc': cutoff_wfc,
        'ecutrho': cutoff_rho,
    }
})
builder.parameters = parameters
----


+*In[7]:*+
[source, ipython3]
----
# Generate a 2x2x2 Monkhorst-Pack mesh
kpoints = KpointsData()
kpoints.set_kpoints_mesh([2, 2, 2])
builder.kpoints = kpoints
----


+*In[8]:*+
[source, ipython3]
----
# Run the calculation on 1 CPU and kill it if it runs longer than 1800 seconds.
# Set ``withmpi`` to ``False`` if ``pw.x`` was compiled without MPI support.
builder.metadata.options = {
    'resources': {
        'num_machines': 1,
    },
    'max_wallclock_seconds': 1800,
    'withmpi': False,
}
----


+*In[9]:*+
[source, ipython3]
----
results, node = run.get_node(builder)
----


+*In[10]:*+
[source, ipython3]
----
node.exit_status
----


+*Out[10]:*+
----0----


+*In[12]:*+
[source, ipython3]
----
print(results)
'''
{
    'output_band': <BandsData: uuid: a82526b7-fb7f-4638-a1e1-f72ad04a5f13 (pk: 59537)>,
    'output_trajectory': <TrajectoryData: uuid: 4122ba08-6318-4029-9892-55b29e89a39c (pk: 59538)>,
    'output_parameters': <Dict: uuid: 82fe2b57-0dc0-4031-81a0-e80ed34db680 (pk: 59539)>,
    'remote_folder': <RemoteData: uuid: 672113fd-7177-4688-ad77-674703b8f611 (pk: 59535)>,
    'retrieved': <FolderData: uuid: 5f44b880-4c74-4194-9484-0c51ec4f1a34 (pk: 59536)>
}
'''
----


+*Out[12]:*+
----
{'output_band': <BandsData: uuid: 5e97fc86-525c-4e15-a9c1-b519e7bcc30d (pk: 856)>, 'output_trajectory': <TrajectoryData: uuid: 2aa85f88-0266-4f5b-a71e-20993c6199fb (pk: 857)>, 'output_parameters': <Dict: uuid: 8c718e1e-e8ba-4a3c-8457-48b4d2816422 (pk: 858)>, 'remote_folder': <RemoteData: uuid: 8f992d18-21da-4e4a-81a2-5d6898dc3943 (pk: 854)>, 'retrieved': <FolderData: uuid: 00dbf505-c454-4e3f-a5ea-dde8fcc03be6 (pk: 855)>}
"\n{\n    'output_band': <BandsData: uuid: a82526b7-fb7f-4638-a1e1-f72ad04a5f13 (pk: 59537)>,\n    'output_trajectory': <TrajectoryData: uuid: 4122ba08-6318-4029-9892-55b29e89a39c (pk: 59538)>,\n    'output_parameters': <Dict: uuid: 82fe2b57-0dc0-4031-81a0-e80ed34db680 (pk: 59539)>,\n    'remote_folder': <RemoteData: uuid: 672113fd-7177-4688-ad77-674703b8f611 (pk: 59535)>,\n    'retrieved': <FolderData: uuid: 5f44b880-4c74-4194-9484-0c51ec4f1a34 (pk: 59536)>\n}\n"----


+*In[ ]:*+
[source, ipython3]
----

_aiidasubmit.sh

#SBATCH --error=_scheduler-stderr.txt
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=64
#SBATCH --time=00:30:00
#SBATCH --mem=234375

#SBATCH -p amd_256
#SBATCH -N 1
#SBATCH -n 64


source /public4/soft/modules/module.sh
module load lapack/3.9.0-wxl-public4 fftw/3.3.8-mpi-public4 netcdf/4.4.1-parallel-icc17-ls-public4 libxc/4.3.4-icc17-ls-public4   szip/2.1.1-wzm-public4  blas/3.8.0-public4
export PATH=/public4/home/a0s000508/software-a0s000508/qe-7.3/qe-install/bin:$PATH


'/public4/home/a0s000508/software-a0s000508/qe-7.3/qe-install/bin/pw.x' '-in' 'aiida.in'  > "aiida.out"



(aiida_QE) yang@yang-Inspiron-3670:~/AiiDA/tutorial_ssh/03_para$ verdi computer show a0s000508_02
---------------------------  --------------------------------------------------
Label                        a0s000508_02
PK                           12
UUID                         1d781e70-7ade-4134-b8c0-350a4f6b8e9e
Description                  amd_256, set 240 GB
Hostname                     ssh.cn-zhongwei-1.paracloud.com
Transport type               core.ssh
Scheduler type               core.slurm
Work directory               /public4/home/a0s000508/AiiDA_local/AiiDA_local_01
Shebang                      #!/bin/bash
Mpirun command               srun -n {tot_num_mpiprocs}
Default #procs/machine       64
Default memory (kB)/machine  240000000
Prepend text                 #SBATCH -p amd_256
                             #SBATCH -N 1
                             #SBATCH -n 64
Append text
---------------------------  --------------------------------------------------
(aiida_QE) yang@yang-Inspiron-3670:~/AiiDA/tutorial_ssh/03_para$ verdi computer configure show a0s000508_02
* username               a0s000508@BSCC-A
* port                   22
* look_for_keys          True
* key_filename           /home/yang/.ssh/id_ed25519.pub
* timeout                60
* allow_agent            True
* proxy_jump
* proxy_command
* compress               True
* gss_auth               False
* gss_kex                False
* gss_deleg_creds        False
* gss_host               ssh.cn-zhongwei-1.paracloud.com
* load_system_host_keys  True
* key_policy             WarningPolicy
* use_login_shell        True
* safe_interval          30.0
(aiida_QE) yang@yang-Inspiron-3670:~/AiiDA/tutorial_ssh/03_para$ 

Reducing memory works at this moment, But I am not sure whether the setting will also work for other jobs. (The customer service of paracloud suggests removing #SBATCH --memory totally. )

Sincerely,
Dr. Guoyu Yang
Lecturer
Jimei Univ, School of Science, Digital Fujian Big Data Modeling and Intelligent Computing Institute
185 Yinjiang Rd.,
Jimei District, Xiamen,361021
Fujian, China
E-mail: 201961000100@jmu.edu.cn

The problem is that the max memory setting is in kB not kiB (i.e. kibibytes, which is base 1024 instead of base 1000). And 268435456 kb is 268 GB. So the reason jobs were failing is because you were requesting more memory than the nodes of the requested queue have. That is why reducing the memory also works. Note that this is a default memory, and so you don’t have to set it. In most cases, it is best to leave it unset.

1 Like