Configure Aiida to use remote HPC

Hello,
I am trying to use aiida as container ( singularity ) and configuring it to be able to use our remote HPC setup. The container setup is completed successfully, however we are failing to establish ssh connection to our remote HPC from Aiida container.
Observing below error message
ValueError: The SSH proxy jump and SSH proxy command options cannot be used together

Hi @Shraddha_Kiran!

It seems you specified both the proxy_command and proxy_jump settings, so AiiDA isn’t sure which one to use. In general, I would recommend using proxy_jump, see:

https://aiida.readthedocs.io/projects/aiida-core/en/latest/howto/ssh.html?highlight=proxy#connecting-to-a-remote-computer-via-a-proxy-server

Could you show me the output of

verdi computer configure show <COMPUTER_LABEL>

where you have to replace <COMPUTER_LABEL> with the label of the computer you are configuring SSH for.

Hi @mbercx
Thanks for your suggestion. I am pasting the output of

 verdi computer configure show <COMPUTER_LABEL> 

below:

(aiida) x0144578@dcalph078:~$ verdi computer configure show  uhpc
* username               x0144578
* port                   22
* look_for_keys          True
* key_filename           /user/x0144578/rsm_ppk.ppk
* timeout                60
* allow_agent            False
* proxy_jump             n
* proxy_command
* compress               True
* gss_auth               False
* gss_kex                False
* gss_deleg_creds        False
* gss_host               dcalph000
* load_system_host_keys  True
* key_policy             RejectPolicy
* use_login_shell        True
* safe_interval          30.0
2023-11-14 05:10:30.170 PST [44129] LOG:  unexpected EOF on client connection with an open transaction
(aiida) x0144578@dcalph078:~$ verdi computer test uhpc --print-traceback
Report: Testing computer<uhpc> for user<arunprasad_pandurangan@contractor.amat.com>...
* Opening connection... [FAILED]: Error while trying to connect to the computer
  Full traceback:
  Traceback (most recent call last):
    File "/user/x0144578/.conda/envs/aiida/lib/python3.11/site-packages/aiida/cmdline/commands/cmd_computer.py", line 547, in computer_test
      with transport:
    File "/user/x0144578/.conda/envs/aiida/lib/python3.11/site-packages/aiida/transports/transport.py", line 128, in __enter__
      self.open()
    File "/user/x0144578/.conda/envs/aiida/lib/python3.11/site-packages/aiida/transports/plugins/ssh.py", line 459, in open
      raise ValueError('The SSH proxy jump and SSH proxy command options can not be used together')
  ValueError: The SSH proxy jump and SSH proxy command options can not be used together
Warning: 1 out of 0 tests failed
2023-11-14 05:11:00.892 PST [44180] LOG:  unexpected EOF on client connection with an open transaction

Thanks @Shraddha_Kiran!

* proxy_jump             n
* proxy_command

It seems you don’t need to use a proxy to connect to your remote HPC? I think you may have accidentally set these when configuring the SSH transport. Can you try running

verdi computer configure core.ssh uhpc

And set no value for SSH proxy jump and SSH proxy command when prompted using ! (an exclamation mark)? That is, just keep the same values for other settings by pressing enter, but set no value for the proxy setting with !, as explained in the report when executing the command:

Report: enter ! to ignore the default and set no value.

The following line:

2023-11-14 05:10:30.170 PST [44129] LOG:  unexpected EOF on client connection with an open transaction

Also is somewhat worrying, but I think is unrelated. This seems to be a warning from the database, but I’m not familiar with it. Maybe someone else has an idea?

Hi @mbercx

Is there any simpler way to check if aiida setup indeed can ‘talk’ to remote HPC?

Are you able to connect to the remote HPC from the container using just the ssh command, i.e. not through AiiDA? In principle you should be able to configure the SSH transport for the AiiDA computer to connect to the remote as well then, unless some special settings are required that I’m not familiar with. Can you show the configuration in the ~/.ssh/config file that you have set up?

Hello @mbercx

We tried ssh-ing to our remote HPC (dcalph000) from aiida image (aiida-core-with-services_edge.sif). Sharing the details below:

Apptainer> ssh dcalph000

Last login: Fri Nov 17 00:11:18 2023 from 10.141.1.78

Welcome to Bright release         9.0
 
                                     Based on Red Hat Enterprise Linux Server 7

                                                                    ID: #000002
 
Use the following commands to adjust your environment:
 
'module avail'            - show available modules

'module add <module>'     - adds a module to your environment for this session

'module initadd <module>' - configure module to be loaded at every login
 
-------------------------------------------------------------------------------

-bash-4.2$ logout

Connection to dcalph000 closed.
Apptainer> cat ~/.ssh/config

SendEnv ESI_HOME

Thanks @Shraddha_Kiran. It’s good to see you can ssh into the remote at least, but it seems the configuration is not stored in the ~/.ssh/config. Where are hostname, user etc of dcalph000 configured?

Can you also show the output of

verdi computer show uhpc

I guess the ssh information is in ESI_HOME, the SendEnv will pass environment variables to the ssh. If there are no sensitive information, maybe @Shraddha_Kiran can show ESI_HOME by echo $ESI_HOME?
We can then give you an recommendation on what is the required parameters set for verdi compture configuration.
One thing you can already try is run verdi computer configure core.ssh uhpc again and when ask for proxy_jump, you give empty string "". This is the source of the exception you mentioned.

Thanks for joining in @jusong.yu!

Hmm, I didn’t know SendEnv could be used to provide connection configuration. I thought it would simply set the environment variables after connecting.

Are you sure about this? I thought you’d have to use ! to properly unset the configuration, since the proxy_jump and proxy_command variable are still considered to be set even if they are an empty string.

I guess you are correct, I am not sure :face_with_open_eyes_and_hand_over_mouth: .

1 Like

Thank you @jusong.yu and @mbercx for your suggestions.

Here’s the output of verdi computer show uhpc

Apptainer> verdi computer show uhpc
Warning: You are currently using a post release development version of AiiDA: 2.4.0.post0
Warning: Be aware that this is not recommended for production and is not officially supported.
Warning: Databases used with this version may not be compatible with future releases of AiiDA
Warning: as you might not be able to automatically migrate your data.

---------------------------  ------------------------------------
Label                        uhpc
PK                           1
UUID                         06e9e96d-bf21-486d-a2d9-0146582b90b3
Description                  uhpc master node
Hostname                     dcalph000
Transport type               core.ssh
Scheduler type               core.slurm
Work directory               /dat/usr/x0144578/
Shebang                      #!/bin/bash
Mpirun command               mpirun -np {tot_num_mpiprocs}
Default #procs/machine       16
Default memory (kB)/machine  32
Prepend text
Append text
---------------------------  ------------------------------------
Apptainer> cat ~/.ssh/config
SendEnv ESI_HOME
Apptainer>
Apptainer> echo $ESI_HOME

I am also attaching the env set on the container environment just FYI

Apptainer> env
BASH_FUNC_switchml()=() {  typeset swfound=1;
 if [ "${MODULES_USE_COMPAT_VERSION:-0}" = '1' ]; then
 typeset swname='main';
 if [ -e /cm/local/apps/environment-modules/4.4.0//libexec/modulecmd.tcl ]; then
 typeset swfound=0;
 unset MODULES_USE_COMPAT_VERSION;
 fi;
 else
 typeset swname='compatibility';
 if [ -e /cm/local/apps/environment-modules/4.4.0//libexec/modulecmd-compat ]; then
 typeset swfound=0;
 MODULES_USE_COMPAT_VERSION=1;
 export MODULES_USE_COMPAT_VERSION;
 fi;
 fi;
 if [ $swfound -eq 0 ]; then
 echo "Switching to Modules $swname version";
 source /cm/local/apps/environment-modules/4.4.0//init/bash;
 else
 echo "Cannot switch to Modules $swname version, command not found";
 return 1;
 fi
}
BASH_FUNC_module()=() {  _module_raw "$@" 2>&1
}
BASH_FUNC__module_raw()=() {  unset _mlshdbg;
 if [ "${MODULES_SILENT_SHELL_DEBUG:-0}" = '1' ]; then
 case "$-" in
 *v*x*)
 set +vx;
 _mlshdbg='vx'
 ;;
 *v*)
 set +v;
 _mlshdbg='v'
 ;;
 *x*)
 set +x;
 _mlshdbg='x'
 ;;
 *)
 _mlshdbg=''
 ;;
 esac;
 fi;
 unset _mlre _mlIFS;
 if [ -n "${IFS+x}" ]; then
 _mlIFS=$IFS;
 fi;
 IFS=' ';
 for _mlv in ${MODULES_RUN_QUARANTINE:-};
 do
 if [ "${_mlv}" = "${_mlv##*[!A-Za-z0-9_]}" -a "${_mlv}" = "${_mlv#[0-9]}" ]; then
 if [ -n "`eval 'echo ${'$_mlv'+x}'`" ]; then
 _mlre="${_mlre:-}${_mlv}_modquar='`eval 'echo ${'$_mlv'}'`' ";
 fi;
 _mlrv="MODULES_RUNENV_${_mlv}";
 _mlre="${_mlre:-}${_mlv}='`eval 'echo ${'$_mlrv':-}'`' ";
 fi;
 done;
 if [ -n "${_mlre:-}" ]; then
 eval `eval ${_mlre}/usr/bin/tclsh /cm/local/apps/environment-modules/4.4.0/libexec/modulecmd.tcl bash '"$@"'`;
 else
 eval `/usr/bin/tclsh /cm/local/apps/environment-modules/4.4.0/libexec/modulecmd.tcl bash "$@"`;
 fi;
 _mlstatus=$?;
 if [ -n "${_mlIFS+x}" ]; then
 IFS=$_mlIFS;
 else
 unset IFS;
 fi;
 unset _mlre _mlv _mlrv _mlIFS;
 if [ -n "${_mlshdbg:-}" ]; then
 set -$_mlshdbg;
 fi;
 unset _mlshdbg;
 return $_mlstatus
}
SHELL=/bin/bash
HISTCONTROL=ignoredups
SLURM_TIME_FORMAT=%b %e %k:%M
HOSTNAME=dcalph078
HISTSIZE=1000
S6_CMD_WAIT_FOR_SERVICES_MAXTIME=0
LANGUAGE=en_US.UTF-8
SINGULARITY_NAME=aiida-core-with-services_edge.sif
QT_GRAPHICSSYSTEM_CHECKED=1
PGSQL_VERSION=15
_LMFILES__modshare=/cm/shared/modulefiles/slurm/19.05.7:1
LIBRARY_PATH_modshare=/cm/shared/apps/slurm/19.05.7/lib64/slurm:1:/cm/shared/apps/slurm/19.05.7/lib64:1
CPATH_modshare=/cm/shared/apps/slurm/19.05.7/include:1
SINGULARITY_ENVIRONMENT=/.singularity.d/env/91-environment.sh
MANPATH_modshare=/usr/local/share/man:1:/usr/share/man/overrides:1:/cm/local/apps/environment-modules/4.4.0//share/man:1:/cm/local/apps/environment-modules/current/share/man:1:/cm/shared/apps/slurm/19.05.7/man:1:/usr/share/man:1
ENV=/user/x0144578/.kshrc
PWD=/user/x0144578
LOGNAME=x0144578
MODULESHOME=/cm/local/apps/environment-modules/4.4.0/
MANPATH=/cm/shared/apps/slurm/19.05.7/man:/cm/local/apps/environment-modules/4.4.0//share/man:/usr/local/share/man:/usr/share/man/overrides:/usr/share/man:/cm/local/apps/environment-modules/current/share/man
SYSTEM_USER=aiida
USER_PATH=/cm/shared/apps/slurm/19.05.7/sbin:/cm/shared/apps/slurm/19.05.7/bin:/cm/local/apps/environment-modules/4.4.0//bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/sbin:/cm/local/apps/environment-modules/4.4.0/bin:/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin
APPTAINER_ENVIRONMENT=/.singularity.d/env/91-environment.sh
APPTAINER_APPNAME=
HOME=/user/x0144578
LANG=en_US.UTF-8
SINFO_FORMAT=%n %.10T %.5a %.8e %.7m %.4c %.10G %.8O %C %f
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:
LD_LIBRARY_PATH_modshare=/cm/shared/apps/slurm/19.05.7/lib64/slurm:1:/cm/shared/apps/slurm/19.05.7/lib64:1
APPTAINER_COMMAND=shell
SINGULARITY_CONTAINER=/dat/usr/x0144578/singularity/aiida-core-with-services_edge.sif
SSH_CONNECTION=10.141.1.200 39192 10.141.1.78 22
SQUEUE_PARTITION=test,interact,license,lic_low,normal,low,open,gpu,gpu_open,short,high
PATH_modshare=/cm/local/apps/environment-modules/4.4.0//bin:1:/usr/sbin:1:/usr/bin:1:/cm/local/apps/environment-modules/4.4.0/bin:1:/cm/shared/apps/slurm/19.05.7/sbin:1:/usr/local/sbin:1:/cm/shared/apps/slurm/19.05.7/bin:1:/usr/local/bin:1:/sbin:1
RMQ_VERSION=3.10.18
APPTAINER_CONTAINER=/dat/usr/x0144578/singularity/aiida-core-with-services_edge.sif
LOADEDMODULES_modshare=slurm/19.05.7:1
TERM=xterm
LESSOPEN=||/usr/bin/lesspipe.sh %s
USER=x0144578
LIBRARY_PATH=/cm/shared/apps/slurm/19.05.7/lib64/slurm:/cm/shared/apps/slurm/19.05.7/lib64
LOADEDMODULES=slurm/19.05.7
SHLVL=2
BASH_ENV=/cm/local/apps/environment-modules/4.4.0//init/bash
CONDA_DIR=/opt/conda
CVS_RSH=ssh
APPTAINER_NAME=aiida-core-with-services_edge.sif
SINGULARITY_BIND=
XDG_SESSION_ID=134554
APPTAINER_BIND=
LD_LIBRARY_PATH=/.singularity.d/libs
XDG_RUNTIME_DIR=/run/user/32752
PS1=Apptainer>
SSH_CLIENT=10.141.1.200 39192 22
SYSTEM_UID=1000
ENABLE_LMOD=0
SQUEUE_SORT=U,P,N
LC_ALL=en_US.UTF-8
XDG_DATA_DIRS=/user/x0144578/.local/share/flatpak/exports/share:/var/lib/flatpak/exports/share:/usr/local/share:/usr/share
PATH=/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home//.local/bin
MODULEPATH=/cm/local/modulefiles:/cm/shared/modulefiles
_LMFILES_=/cm/shared/modulefiles/slurm/19.05.7
MAIL=/var/spool/mail/x0144578
SSH_TTY=/dev/pts/3
SQUEUE_FORMAT2=jobid:8,username:9,statecompact:3,partition:13,name:20,command:20,submittime:13,numcpus:5,gres:12,feature:25,numnodes:6,reasonlist:90
SYSTEM_GID=100
CPATH=/cm/shared/apps/slurm/19.05.7/include
DEBIAN_FRONTEND=noninteractive
MODULES_CMD=/cm/local/apps/environment-modules/4.4.0/libexec/modulecmd.tcl
_=/usr/bin/env

Thanks @Shraddha_Kiran,

I still don’t really know how your ssh to dcalph000 is configured to be honest. :sweat_smile: I’m no sysadmin however, it’s probably some approach I’m not familiar with.

In any case, let’s just try setting up a new computer, but this time not specifying the proxy settings. Create two files with the following names and contents:

  • uhpc-test.yaml:
label: uhpc-test
description: uhpc master node
hostname: dcalph000
transport: core.ssh
scheduler: core.slurm
shebang: '#!/bin/bash'
work_dir: /dat/usr/x0144578/
mpirun_command: mpirun -np {tot_num_mpiprocs}
mpiprocs_per_machine: 16
prepend_text: ' '
append_text: ' '
  • uhpc-test-configure.yaml:
username: x0144578
key_filename: /user/x0144578/rsm_ppk.ppk
safe_interval: 10.0

(Also maybe check if I haven’t made any typos here)

Then first set up the new computer:

verdi computer setup -n --config uhpc-test.yaml

And subsequently configure the core.ssh transport for it:

verdi computer configure core.ssh uhpc-test -n --config uhpc-test-configure.yaml

then test the computer, and pray :pray:

verdi computer test uhpc-test

Or at least report back in case there are any issues. ^^

Hi Marnik,
After following the given steps, we are getting error that getting invalid configuration file.

(aiida) x0144578@dcalph078:~$ vi uhpc-test.yaml
(aiida) x0144578@dcalph078:~$ cat uhpc-test.yaml

username: x0144578

key_filename: /user/x0144578/rsm_ppk.ppk

safe_interval: 10.0

(aiida) x0144578@dcalph078:~$

(aiida) x0144578@dcalph078:~$ verdi computer setup -n --config uhpc-test.yaml

Usage: verdi computer setup [OPTIONS]

Try 'verdi computer setup --help' for help.

Error: Invalid value for '--config': Invalid configuration file, the following keys are not supported: {'safe_interval', 'username', 'key_filename'}

(aiida) x0144578@dcalph078:~$

Thanks
Arun parasd P

Dear @Arun,

Apologies, I was a bit too quick. I mixed up the “setup” file contents (uhpc-test.yaml) with the “SSH configure” one (uhpc-test-configure.yaml). I’ve corrected my post above, can you give it another try?

Best,
Marnik

Hello @mbercx

Are we sure that after we configure aiida to user remote HPC, it uses ssh connection in the backend to connect to remote HPC?

The computer has the core.ssh transport configured, which uses paramiko to connect to the remote, see:

paramiko is a Python implementation of the SSH protocol:

https://paramiko.org/

Hello @mbercx

Your suggestion worked and we were able to ssh to our remote HPC successfully. :slight_smile: Thank you again!

Now that we can access our HPC environment, what is the simplest way to test aiida-quantumespresso?
Note: We already have quantumespresso ( qe-6.8 ) available on our remote HPC

Regards
Shraddha

1 Like

Best approach would be to follow the get started instructions of the documentation: Get started — aiida-quantumespresso documentation
Since you already configured the computer, you can skip that step and go to configuring the code. When you finished that and installed the pseudo potentials, you can test running a pw.x calculation with

aiida-quantumespresso calculation launch pw -X <CODE> -F SSSP/1.2/PBE/efficiency

replacing <CODE> with the label of the code you setup.