Hi,
I am new to aiida as well as hpc for highthroughput worklfow, so applogies in advance if I miss something obvious.
So, the main issue is, verdi computer test is successful for all but the creation and deletion of temporary file when I use the scratch directory (lustre/parallel file system on hpc) as work_dir for configuring the hpc computer :
$ verdi computer show noctua2
Label noctua2
PK 10
UUID 8a16267b-5570-47f3-a10e-c84764f9d953
Description Noctua 2 HPC Cluster at PC2
Hostname fe.noctua2.pc2.uni-paderborn.de
Transport type core.ssh
Scheduler type core.slurm
Work directory /scratch/hpc-prf-spectr/Abdullah/temp
Shebang #!/bin/bash
Mpirun command srun -n {tot_num_mpiprocs}
Default #procs/machine 64
Default memory (kB)/machine 268435456
Prepend text export PATH=/opt/software/pc2/EB-SW/software/OpenMPI/4.1.6-GCC-13.2.0/bin:$PATH
export LD_LIBRARY_PATH=/opt/software/pc2/EB-SW/software/OpenMPI/4.1.6-GCC-13.2.0/lib:$LD_LIBRARY_PATH
Append text echo “Job execution complete.”
$ verdi computer test noctua2 --print-traceback
Report: Testing computer for useraiida@localhost…
-
Opening connection… [OK]
-
Checking for spurious output… [OK]
-
Getting number of jobs from scheduler… [OK]: 0 jobs found in the queue
-
Determining remote user name… [OK]: abshahid
-
Creating and deleting temporary file… [Failed]: OSError: Error during mkdir of ‘/scratch’, maybe you don’t have the permissions to do it, or the directory already exists? ([Errno 13] Permission denied)
Full traceback:
Traceback (most recent call last):
File “/opt/conda/lib/python3.10/site-packages/aiida/cmdline/commands/cmd_computer.py”, line 124, in _computer_create_temp_file
transport.chdir(workdir)
File “/opt/conda/lib/python3.10/site-packages/aiida/transports/plugins/ssh.py”, line 593, in chdir
self.sftp.chdir(path)
File “/opt/conda/lib/python3.10/site-packages/paramiko/sftp_client.py”, line 659, in chdir
if not stat.S_ISDIR(self.stat(path).st_mode):
File “/opt/conda/lib/python3.10/site-packages/paramiko/sftp_client.py”, line 493, in stat
t, msg = self._request(CMD_STAT, path)
File “/opt/conda/lib/python3.10/site-packages/paramiko/sftp_client.py”, line 822, in _request
return self._read_response(num)
File “/opt/conda/lib/python3.10/site-packages/paramiko/sftp_client.py”, line 874, in _read_response
self._convert_status(msg)
File “/opt/conda/lib/python3.10/site-packages/paramiko/sftp_client.py”, line 903, in _convert_status
raise IOError(errno.ENOENT, text)
FileNotFoundError: [Errno 2] No such fileDuring handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “/opt/conda/lib/python3.10/site-packages/aiida/transports/plugins/ssh.py”, line 704, in mkdir
self.sftp.mkdir(path)
File “/opt/conda/lib/python3.10/site-packages/paramiko/sftp_client.py”, line 460, in mkdir
self._request(CMD_MKDIR, path, attr)
File “/opt/conda/lib/python3.10/site-packages/paramiko/sftp_client.py”, line 822, in _request
return self._read_response(num)
File “/opt/conda/lib/python3.10/site-packages/paramiko/sftp_client.py”, line 874, in _read_response
self._convert_status(msg)
File “/opt/conda/lib/python3.10/site-packages/paramiko/sftp_client.py”, line 905, in _convert_status
raise IOError(errno.EACCES, text)
PermissionError: [Errno 13] Permission deniedDuring handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “/opt/conda/lib/python3.10/site-packages/aiida/cmdline/commands/cmd_computer.py”, line 552, in computer_test
success, message = test(
File “/opt/conda/lib/python3.10/site-packages/aiida/cmdline/commands/cmd_computer.py”, line 126, in _computer_create_temp_file
transport.makedirs(workdir)
File “/opt/conda/lib/python3.10/site-packages/aiida/transports/plugins/ssh.py”, line 689, in makedirs
self.mkdir(this_dir)
File “/opt/conda/lib/python3.10/site-packages/aiida/transports/plugins/ssh.py”, line 707, in mkdir
raise OSError(
OSError: Error during mkdir of ‘/scratch’, maybe you don’t have the permissions to do it, or the directory already exists? ([Errno 13] Permission denied) -
Checking for possible delay from using login shell… [OK]
Warning: 1 out of 6 tests failed
However, all the 6 tests are successful when I use the home directory (small storage). :
$ verdi computer show noctua2
Label noctua2
PK 11
UUID 5e018103-0bd1-457f-8f21-698dda8ca813
Description Noctua 2 HPC Cluster at PC2
Hostname fe.noctua2.pc2.uni-paderborn.de
Transport type core.ssh
Scheduler type core.slurm
Work directory /pc2/users/a/abshahid/aiida_wor_dira
Shebang #!/bin/bash
Mpirun command srun -n {tot_num_mpiprocs}
Default #procs/machine 64
Default memory (kB)/machine 268435456
Prepend text export PATH=/opt/software/pc2/EB-SW/software/OpenMPI/4.1.6-GCC-13.2.0/bin:$PATH
export LD_LIBRARY_PATH=/opt/software/pc2/EB-SW/software/OpenMPI/4.1.6-GCC-13.2.0/lib:$LD_LIBRARY_PATH
Append text echo “Job execution complete.”
$ verdi computer test noctua2
Report: Testing computer for useraiida@localhost…
- Opening connection… [OK]
- Checking for spurious output… [OK]
- Getting number of jobs from scheduler… [OK]: 0 jobs found in the queue
- Determining remote user name… [OK]: abshahid
- Creating and deleting temporary file… [OK]
- Checking for possible delay from using login shell… [OK]
Success: all 6 tests succeeded
I have tried to investigate, this is what I was able to understand: AiiDA relies on SSH-based SFTP operations (using Paramiko) to manage remote work directories during its computer tests. It first attempts to change to the designated work directory using sftp.chdir(), and if that fails (typically because the directory doesn’t exist or isn’t accessible), it then tries to create it using transport.makedirs(). HPC systems like Noctua 2 could have SFTP chrooted to my home directory. If the chroot restriction is true, it means that when AiiDA attempts to access directories outside the home (e.g., a scratch space), the operations fail. I have also tried using symlink but didnot work maybe due to permission issues with symbolic links.
So, my question is , What do you think about the issue? I have tried to look up the discussions here, but could not find any relavant posts, does that mean it is not a typical issue? Is this my HPC specific issue or am I thinking the whole thing wrong and making a very obvious mistake? I highly appreciate any solution or advice about my problem. Thank you.
noctua2_computer.txt (649 Bytes)
noctua2_ssh_config.txt (400 Bytes)