Aiida-vibroscopy: failed with exit status 305: Both the stdout and XML output files could not be read or parsed

HI all,
I am trying to run aiida-vibroscopy but ran into this error.

aiidaENV) rkarkee@ch-fe1:/lustre/scratch5/rkarkee> verdi process report 189
2024-03-04 16:18:58 [91  | REPORT]: [189|IRamanSpectraWorkChain|run_spectra]: submitting `HarmonicWorkChain` <PK=191>
2024-03-04 16:19:00 [92  | REPORT]:   [191|HarmonicWorkChain|run_phonon]: submitting `PhononWorkChain` <PK=197>
2024-03-04 16:19:00 [93  | REPORT]:   [191|HarmonicWorkChain|run_dielectric]: submitting `DielectricWorkChain` <PK=201>
2024-03-04 16:19:01 [94  | REPORT]:     [201|DielectricWorkChain|run_base_scf]: launching base scf PwBaseWorkChain<208>
2024-03-04 16:19:01 [95  | REPORT]:     [197|PhononWorkChain|run_base_supercell]: launching base supercell scf PwBaseWorkChain<210>
2024-03-04 16:19:02 [96  | REPORT]:       [210|PwBaseWorkChain|run_process]: launching PwCalculation<213> iteration #1
2024-03-04 16:19:02 [97  | REPORT]:       [208|PwBaseWorkChain|run_process]: launching PwCalculation<216> iteration #1
2024-03-04 16:19:47 [102 | REPORT]:       [208|PwBaseWorkChain|report_error_handled]: PwCalculation<216> failed with exit status 305: Both the stdout and XML output files could not be read or parsed.
2024-03-04 16:19:47 [103 | REPORT]:       [208|PwBaseWorkChain|report_error_handled]: Action taken: unrecoverable error, aborting...
2024-03-04 16:19:48 [104 | REPORT]:       [208|PwBaseWorkChain|inspect_process]: PwCalculation<216> failed but a handler detected an unrecoverable problem, aborting
2024-03-04 16:19:48 [105 | REPORT]:       [208|PwBaseWorkChain|on_terminated]: remote folders will not be cleaned
2024-03-04 16:19:48 [106 | REPORT]:     [201|DielectricWorkChain|inspect_base_scf]: base scf failed with exit status 300
2024-03-04 16:19:48 [107 | REPORT]:     [201|DielectricWorkChain|on_terminated]: remote folders will not be cleaned
2024-03-04 16:19:58 [112 | REPORT]:       [210|PwBaseWorkChain|report_error_handled]: PwCalculation<213> failed with exit status 305: Both the stdout and XML output files could not be read or parsed.
2024-03-04 16:19:58 [113 | REPORT]:       [210|PwBaseWorkChain|report_error_handled]: Action taken: unrecoverable error, aborting...
2024-03-04 16:19:58 [114 | REPORT]:       [210|PwBaseWorkChain|inspect_process]: PwCalculation<213> failed but a handler detected an unrecoverable problem, aborting
2024-03-04 16:19:59 [115 | REPORT]:       [210|PwBaseWorkChain|on_terminated]: remote folders will not be cleaned
2024-03-04 16:19:59 [116 | REPORT]:     [197|PhononWorkChain|inspect_base_supercell]: base supercell scf failed with exit status 300
2024-03-04 16:19:59 [117 | REPORT]:     [197|PhononWorkChain|on_terminated]: remote folders will not be cleaned
2024-03-04 16:19:59 [118 | REPORT]:   [191|HarmonicWorkChain|inspect_processes]: the child `PhononWorkChain` with <PK=197> failed
2024-03-04 16:19:59 [119 | REPORT]:   [191|HarmonicWorkChain|on_terminated]: remote folders will not be cleaned
2024-03-04 16:19:59 [120 | REPORT]: [189|IRamanSpectraWorkChain|inspect_process]: `HarmonicWorkChain` failed with exit status 400

I am using overrides.yaml as

aiidaENV) rkarkee@ch-fe1:~> less overrides.yaml
clean_workdir: false # whether to clean the working directiories
dielectric:
  clean_workdir: false
  kpoints_parallel_distance: 0.2 # kpoints distance in Angstrom^-1 to sample the BZ parallel to the electric field. If used, it should help in converging faster the final results
  property: raman
  # central_difference: # if you know what you are doing, custom numerical derivatives with respect to electric field
  #   accuracy: 2
  #   electric_field_step: 0.0005
  scf:
    pseudo_family: PBE_PSP_new
    kpoints_distance: 0.4 # kpoints distance in Angstrom^-1 to sample the BZ
    kpoints_force_parity: false
    max_iterations: 5
    pw:
      metadata:
        options:
          max_wallclock_seconds: 43200
          resources:
            num_machines: 1
            num_mpiprocs_per_machine: 256
              #queue_name: standard # for SLURM
          # account: account_name # for SLURM, also for project etc
          withmpi: true
      settings:
        cmdline: ['-pd', '.true.']
      parameters:
        ELECTRONS:
          conv_thr: 2.0e-10
          electron_maxstep: 100
          mixing_beta: 0.2
        SYSTEM:
          ecutrho: 280.0
          ecutwfc: 70.0
          vdw_corr: Grimme-D2


  settings:
    sleep_submission_time: 1.0
phonon:
  clean_workdir: false
  displacement_generator:
    distance: 0.01 # atomic displacements for phonon calculation, in Angstrom
  scf:
    pseudo_family: PBE_PSP_new
    kpoints_distance: 0.15 # kpoints distance in Angstrom^-1 to sample the BZ
    kpoints_force_parity: false
    max_iterations: 5
    pw:
      metadata:
        options:
          max_wallclock_seconds: 43200
          resources:
            num_machines: 1
            num_mpiprocs_per_machine: 1
          # queue_name: partition_name # for SLURM
          # account: account_name # for SLURM, also for project etc
          withmpi: true
      settings:
        cmdline: ['-pd', '.true.']
        # gamma_only: True # to use only if KpointsData has only a mesh 1 1 1 0 0 0 (i.e. Gamma not shifted)
      parameters:
        ELECTRONS:
          conv_thr: 2.0e-12
          electron_maxstep: 80
          mixing_beta: 0.4
        SYSTEM:

          ecutwfc: 70.0
          ecutrho: 280

          vdw_corr: Grimme-D2

  settings:
    sleep_submission_time: 1.0 # waiting time in seconds between different submission of SCF calculation. Recommended to be at least 1 second, to not overload.
settings:
  run_parallel: true
  use_primitive_cell: false
symmetry:
  distinguish_kinds: false
  is_symmetry: true
  symprec: 1.0e-05

and submitting the job via

# -*- coding: utf-8 -*-
# pylint: disable=line-too-long,wildcard-import,pointless-string-statement,unused-wildcard-import
"""Submit an IRamanSpectraWorkChain via the get_builder_from_protocol using the overrides."""
from pathlib import Path

from aiida import load_profile
from aiida.engine import submit
from aiida.orm import *
from aiida_quantumespresso.common.types import ElectronicType

from aiida_vibroscopy.workflows.spectra.iraman import IRamanSpectraWorkChain

load_profile()

# =============================== INPUTS =============================== #
# Please, change the following inputs.
mesh = [[4, 4, 2], [0.5, 0.5, 0.5]]
pseudo_family_name='PBE_PSP_new'
pw_code_label = 'qe-7.3@hpc'
structure_id = 4  # PK or UUID of your AiiDA StructureData
protocol = 'fast'  # also 'moderate' and 'precise'; 'moderate' should be good enough in general
overrides_filepath = './overrides.yaml'  # should be a path, e.g. /path/to/overrides.yaml. Format is YAML
# Consult the documentation for HOW-TO for how to use properly the overrides.
# !!!!! FOR FULL INPUT NESTED STRUCTURE: https://aiida-vibroscopy.readthedocs.io/en/latest/topics/workflows/spectra/iraman.html
# You can follow the input structure provided on the website to fill further the overrides.
# ====================================================================== #
# If you don't have a StructureData, but you have a CIF or XYZ, or similar, file
# you can import your structure uncommenting the following:
# from ase.io import read
# atoms = read('/path/to/file.cif')
# structure = StructureData(ase=atoms)
# structure.store()
# structure_id =  structure.pk
# print(f"Your structure has been stored in the database with PK={structure_id}")


def main():
    """Submit an IRamanSpectraWorkChain calculation."""
    code = load_code(pw_code_label)
    structure = load_node(structure_id)
    kwargs = {'electronic_type': ElectronicType.INSULATOR}

    kpoints = KpointsData()
    kpoints.set_kpoints_mesh(mesh[0], mesh[1])

    #pseudo_family = load_group(pseudo_family_name)
    #pseudos = pseudo_family.get_pseudos(structure=structure)

    builder = IRamanSpectraWorkChain.get_builder_from_protocol(
        code=code,
        structure=structure,
        protocol=protocol,
        overrides=Path(overrides_filepath),
        **kwargs,
    )

    builder.dielectric.scf.kpoints = kpoints
    builder.dielectric.pop('kpoints_parallel_distance', None)
    builder.dielectric.scf.pop('kpoints_distance', None)
    builder.phonon.scf.kpoints = kpoints

    #builder.dielectric.scf.pw.pseudos = pseudos
    #builder.phonon.scf.pw.pseudos = pseudos

    calc=submit(builder)
    print(f'Submitted IRamanSpectraWorkChain with PK={calc.pk} and UUID={calc.uuid}')
    print('Register *at least* the PK number, e.g. in you submit script.')
    print('You can monitor the status of your calculation with the following commands:')
    print('  * verdi process status PK')
    print('  * verdi process list -L IRamanSpectraWorkChain # list all running IRamanSpectraWorkChain')
    print(
        '  * verdi process list -ap1 -L IRamanSpectraWorkChain # list all IRamanSpectraWorkChain submitted in the previous 1 day'
    )
    print('If the WorkChain finishes with exit code 0, then you can inspect the outputs and post-process the data.')
    print('Use the command')
    print('  * verdi process show PK')
    print('To show further information about your WorkChain. When finished, you should see some outputs.')
    print('The main output can be accessed via `load_node(PK).outputs.vibrational_data.numerical_accuracy_*`.')
    print('You have to complete the remaning `*`, which depends upond the accuracy of the calculation.')
    print('See also the documentation and the reference paper for further details.')


if __name__ == '__main__':
    """Run script."""
    main()

I am running in HPC. I have created a code.yaml and configured.

label: 'qe-7.3'
description: 'quantum_espresso v7.3'
default_calc_job_plugin: 'quantumespresso.pw'
filepath_executable: '/users/rkarkee/q-e-qe-7.3/bin/pw.x'
computer: 'hpc'
prepend_text: |
    module swap PrgEnv-cray PrgEnv-intel
    module load cmake
append_text: ' '

I’ve installed aiida via conda environment. The status looks following:

(aiidaENV) rkarkee@ch-fe1:~> verdi status
 ✔ version:     AiiDA v2.5.1
 ✔ config:      /users/rkarkee/.aiida
 ✔ profile:     quicksetup
 ✔ storage:     Storage for 'quicksetup' [open] @ postgresql://aiida_qs_rkarkee_20554bcc4bead70a3479c4ef8d5f1f4e:***@localhost:5434/quicksetup_rkarkee_20554bcc4bead70a3479c4ef8d5f1f4e / DiskObjectStoreRepository: 983ccf2a46b74550938fdb7753c1117f | /users/rkarkee/.aiida/repository/quicksetup/container
 ✔ rabbitmq:    Connected to RabbitMQ v3.8.14 as amqp://guest:guest@127.0.0.1:5672?heartbeat=600
 ✔ daemon:      Daemon is running with PID 242406

Can you please help me resolve this issue?

Thanks

Best
Rijan

There is a problem with at least one of the PwCalculations. You should investigate its outputs to check what the actual cause is. Usually when the output files are corrupt, there is something seriously wrong with the calculation, usually specific to the computer setup. You should do verdi calcjob gotocomputer 216 which will bring you to the working directory, and then look at the aiida.out (which may not even be there) and the _scheduler-stdout.txt, _scheduler-stderr.txt files. They will probably contain hints as to what caused the problem.

Hi @sphuber
I went to the drive and have following.

(base) rkarkee@ch-fe1:/lustre/scratch5/.mdt0/rkarkee/Runaiida/02/4f/9f80-cdd0-4986-bbe3-d555e0e439bb> ls
aiida.in  aiida.out  _aiidasubmit.sh  out  pseudo  _scheduler-stderr.txt  _scheduler-stdout.txt

The _aiidasubmit.sh looks like:

#!/bin/bash
#SBATCH --no-requeue
#SBATCH --job-name="aiida-216"
#SBATCH --get-user-env
#SBATCH --output=_scheduler-stdout.txt
#SBATCH --error=_scheduler-stderr.txt
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=256
#SBATCH --time=12:00:00
#SBATCH --mem=50

module swap PrgEnv-cray PrgEnv-intel
module load cmake


'srun' '-n' '256' '/users/rkarkee/q-e-qe-7.3/bin/pw.x' '-pd' '.true.' '-in' 'aiida.in'  > 'aiida.out'

Then the _scheduler_stderr.txt has

slurmstepd: error: common_cgroup_instantiate: unable to create cgroup '/sys/fs/cgroup/memory/slurm/uid_39336/job_10499961/step_0/task_22' : Cannot allocate memory
slurmstepd: error: unable to instantiate task 22 cgroup
slurmstepd: error: task_g_set_affinity: Cannot allocate memory
slurmstepd: error: _exec_wait_child_wait_for_parent: failed: No error
srun: error: task 0 launch failed: Slurmd could not execve job
srun: error: task 1 launch failed: Slurmd could not execve job
srun: error: task 2 launch failed: Slurmd could not execve job
srun: error: task 3 launch failed: Slurmd could not execve job
srun: error: task 4 launch failed: Slurmd could not execve job
srun: error: task 5 launch failed: Slurmd could not execve job
srun: error: task 6 launch failed: Slurmd could not execve job
srun: error: task 7 launch failed: Slurmd could not execve job
srun: error: task 8 launch failed: Slurmd could not execve job
...

Input file looks good to me and there is pseudopotential files inside pseudo directory.
My standard job script while running QE looks the following.

#!/bin/bash

#Submit this script with: sbatch filename

#SBATCH --time=16:00:00   # walltime
#SBATCH --nodes=1   # number of nodes
#SBATCH --ntasks-per-node=256   # number of tasks per node
#SBATCH --job-name=HfTe5   # job name
#SBATCH --partition=standard   # partition name
#SBATCH --no-requeue   # do not requeue when preempted and on node failure
#SBATCH --signal=30@20  # send signal to job at [seconds] before end

module swap PrgEnv-cray PrgEnv-intel
module load cmake


srun -n 256   /users/rkarkee/q-e-qe-7.3/bin/pw.x -pd .true.    < scf.in &> scf.out

I do not see the partition name there in _aiidasubmit.sh

The error message clearly comes from SLURM saying it cannot allocate the requested memory. Then when we look at the _aiidasubmit.sh script, we see it contains

#SBATCH --mem=50

This line doesn’t make any sense. I suspect you set the default_memory_per_node setting on your computer. This setting now requests 50kb of memory, which is way too small. You should not set this default. It is a default setting after all, and not required.

Hi @sphuber
I created a new computer and ignored with ! when it asked me to enter memory per computer.

Then I ran into an issue.
I wanted to include spin orbit couplings. The aiida.in has noncolin=‘.true.’ instead of noncolin=.true.

I tested this with QE by putting quotation on .true. ie ‘.true.’ Then QE also gave me the same error as aiida.out

The system flag in aiida.in has

&SYSTEM
  ecutrho =   2.8000000000d+02
  ecutwfc =   7.0000000000d+01
  ibrav = 0
  lspinorb = '.true.'
  nat = 12
  noncolin = '.true.'
  nosym = .false.
  ntyp = 2
  occupations = 'fixed'
  vdw_corr = 'Grimme-D2'
/

and output I got

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     Error in routine  read_namelists (1):
      bad line in namelist &system: "  noncolin = '.true.'" (error could be in the previous line)

This can be easily solved in QE by ignoring the quotation in .true.

But I am using overrides.yaml (I feel this is easy) which is as follow. I am not using any quotation but it happen to make one in actual aiida.in which is causing this error. How can I fix this?

clean_workdir: false # whether to clean the working directiories
dielectric:
  clean_workdir: false
  kpoints_parallel_distance: 0.2 # kpoints distance in Angstrom^-1 to sample the BZ parallel to the electric field. If used, it should help in converging faster the final results
  property: raman
  # central_difference: # if you know what you are doing, custom numerical derivatives with respect to electric field
  #   accuracy: 2
  #   electric_field_step: 0.0005
  scf:
    pseudo_family: PBE_PSP_new
    kpoints_distance: 0.4 # kpoints distance in Angstrom^-1 to sample the BZ
    kpoints_force_parity: false
    max_iterations: 5
    pw:
      metadata:
        options:
          max_wallclock_seconds: 43200
          resources:
            num_machines: 1
            num_mpiprocs_per_machine: 256
              #queue_name: standard # for SLURM
          # account: account_name # for SLURM, also for project etc
          withmpi: true
      settings:
        cmdline: ['-pd', '.true.']
      parameters:
        ELECTRONS:
          conv_thr: 2.0e-10
          electron_maxstep: 100
          mixing_beta: 0.2
        SYSTEM:
          ecutrho: 280.0
          ecutwfc: 70.0
          vdw_corr: Grimme-D2
          noncolin: .true.
          lspinorb: .true.


  settings:
    sleep_submission_time: 1.0
phonon:
  clean_workdir: false
  displacement_generator:
    distance: 0.01 # atomic displacements for phonon calculation, in Angstrom
  scf:
    pseudo_family: PBE_PSP_new
    kpoints_distance: 0.15 # kpoints distance in Angstrom^-1 to sample the BZ
    kpoints_force_parity: false
    max_iterations: 5
    pw:
      metadata:
        options:
          max_wallclock_seconds: 43200
          resources:
            num_machines: 1
            num_mpiprocs_per_machine: 1
          # queue_name: partition_name # for SLURM
          # account: account_name # for SLURM, also for project etc
          withmpi: true
      settings:
        cmdline: ['-pd', '.true.']
        # gamma_only: True # to use only if KpointsData has only a mesh 1 1 1 0 0 0 (i.e. Gamma not shifted)
      parameters:
        ELECTRONS:
          conv_thr: 2.0e-12
          electron_maxstep: 80
          mixing_beta: 0.4
        SYSTEM:

          ecutwfc: 70.0
          ecutrho: 280

          vdw_corr: Grimme-D2
          noncolin: .true.
          lspinorb: .true.

  settings:
    sleep_submission_time: 1.0 # waiting time in seconds between different submission of SCF calculation. Recommended to be at least 1 second, to not overload.
settings:
  run_parallel: true
  use_primitive_cell: false
symmetry:
  distinguish_kinds: false
  is_symmetry: true
  symprec: 1.0e-05

The overrides.yaml file is in YAML format. You can check how the syntax of yaml works with strings e.g. here, but essentially it considers as strings anything that is not e.g. a number, or a special keyword. so

noncolin: .true.

actually means

noncolin: ".true."

(as you can see also from the syntax highlighting here in discourse).

You instead want to pass an actual boolean True, as it’s done for some of the other flags where you pass the boolean value False. There are various strings that are accepted by yaml as a boolean True, one is true:

noncolin: true

(see, again, the different syntax highlighting here in discourse).

This will fix your issue. (do the same also for lspinorb of course).

(The other piece of information is that AiiDA will then convert a boolean True to the value .true. that Fortran namelists want; while it will convert strings adding quotes, as again Fortran wants. This is why you get quotes).

Thank you @giovannipizzi and @sphuber.
That indeed fixed the problem but I am confused between verdi process status Pk and verdi process report Pk command.

In my case, for the above calculation, I am seeing

(aiidaENV) rkarkee@ch-fe1:~> verdi process status 493
IRamanSpectraWorkChain<493> Waiting [None]
    └── HarmonicWorkChain<495> Waiting [1:if_(should_run_parallel)]
        ├── generate_preprocess_data<496> Finished [0]
        ├── PhononWorkChain<501> Waiting [4:run_forces]
        │   ├── generate_preprocess_data<506> Finished [0]
        │   ├── get_supercell<508> Finished [0]
        │   ├── PwBaseWorkChain<514> Finished [0] [3:results]
        │   │   └── PwCalculation<517> Finished [0]
        │   ├── get_supercells_with_displacements<605> Finished [0]
        │   ├── PwBaseWorkChain<619> Waiting [2:while_(should_run_process)(1:run_process)]
        │   │   └── PwCalculation<677> Waiting
        │   ├── PwBaseWorkChain<621> Waiting [2:while_(should_run_process)(1:run_process)]
        │   │   └── PwCalculation<624> Waiting
        │   ├── PwBaseWorkChain<626> Waiting [2:while_(should_run_process)(1:run_process)]
        │   │   └── PwCalculation<629> Waiting
        │   ├── PwBaseWorkChain<631> Waiting [2:while_(should_run_process)(1:run_process)]
        │   │   └── PwCalculation<634> Waiting
        │   ├── PwBaseWorkChain<636> Waiting [2:while_(should_run_process)(1:run_process)]
        │   │   └── PwCalculation<639> Waiting
        │   ├── PwBaseWorkChain<641> Waiting [2:while_(should_run_process)(1:run_process)]
        │   │   └── PwCalculation<644> Waiting
        │   ├── PwBaseWorkChain<646> Waiting [2:while_(should_run_process)(1:run_process)]
        │   │   └── PwCalculation<649> Finished [0]
        │   ├── PwBaseWorkChain<651> Waiting [2:while_(should_run_process)(1:run_process)]
        │   │   └── PwCalculation<654> Waiting
        │   ├── PwBaseWorkChain<656> Waiting [2:while_(should_run_process)(1:run_process)]
        │   │   └── PwCalculation<659> Waiting
        │   ├── PwBaseWorkChain<661> Waiting [2:while_(should_run_process)(1:run_process)]
        │   │   └── PwCalculation<664> Waiting
        │   ├── PwBaseWorkChain<666> Waiting [2:while_(should_run_process)(1:run_process)]
        │   │   └── PwCalculation<669> Waiting
        │   └── PwBaseWorkChain<671> Waiting [2:while_(should_run_process)(1:run_process)]
        │       └── PwCalculation<674> Waiting
        └── DielectricWorkChain<505> Finished [0] [11:results]
            ├── PwBaseWorkChain<512> Finished [0] [3:results]
            │   └── PwCalculation<520> Finished [0]
            ├── PwBaseWorkChain<529> Finished [0] [3:results]
            │   └── PwCalculation<532> Finished [0]
            ├── compute_critical_electric_field<538> Finished [0]
            ├── get_accuracy_from_critical_field<540> Finished [0]
            ├── get_electric_field_step<542> Finished [0]
            ├── PwBaseWorkChain<546> Finished [0] [3:results]
            │   └── PwCalculation<549> Finished [0]
            ├── PwBaseWorkChain<557> Finished [0] [3:results]
            │   └── PwCalculation<560> Finished [0]
            ├── PwBaseWorkChain<563> Finished [0] [3:results]
            │   └── PwCalculation<566> Finished [0]
            ├── PwBaseWorkChain<569> Finished [0] [3:results]
            │   └── PwCalculation<572> Finished [0]
            ├── PwBaseWorkChain<575> Finished [0] [3:results]
            │   └── PwCalculation<578> Finished [0]
            ├── PwBaseWorkChain<581> Finished [0] [3:results]
            │   └── PwCalculation<584> Finished [0]
            ├── PwBaseWorkChain<587> Finished [0] [3:results]
            │   └── PwCalculation<590> Finished [0]
            ├── subtract_residual_forces<712> Finished [0]
            └── NumericalDerivativesWorkChain<719> Finished [0] [None]
                ├── generate_preprocess_data<720> Finished [0]
                ├── compute_nac_parameters<722> Finished [0]
                ├── compute_susceptibility_derivatives<724> Finished [0]
                └── join_tensors<727> Finished [0]

and

(aiidaENV) rkarkee@ch-fe1:~> verdi process report 493
2024-03-05 21:47:19 [267 | REPORT]: [493|IRamanSpectraWorkChain|run_spectra]: submitting `HarmonicWorkChain` <PK=495>
2024-03-05 21:47:21 [268 | REPORT]:   [495|HarmonicWorkChain|run_phonon]: submitting `PhononWorkChain` <PK=501>
2024-03-05 21:47:22 [269 | REPORT]:   [495|HarmonicWorkChain|run_dielectric]: submitting `DielectricWorkChain` <PK=505>
2024-03-05 21:47:23 [270 | REPORT]:     [505|DielectricWorkChain|run_base_scf]: launching base scf PwBaseWorkChain<512>
2024-03-05 21:47:23 [271 | REPORT]:     [501|PhononWorkChain|run_base_supercell]: launching base supercell scf PwBaseWorkChain<514>
2024-03-05 21:47:24 [272 | REPORT]:       [514|PwBaseWorkChain|run_process]: launching PwCalculation<517> iteration #1
2024-03-05 21:47:24 [273 | REPORT]:       [512|PwBaseWorkChain|run_process]: launching PwCalculation<520> iteration #1
2024-03-05 23:42:00 [274 | REPORT]:       [512|PwBaseWorkChain|results]: work chain completed after 1 iterations
2024-03-05 23:42:00 [275 | REPORT]:       [512|PwBaseWorkChain|on_terminated]: remote folders will not be cleaned
2024-03-05 23:42:01 [276 | REPORT]:     [505|DielectricWorkChain|run_nscf]: launching base scf PwBaseWorkChain<529>
2024-03-05 23:42:02 [277 | REPORT]:       [529|PwBaseWorkChain|run_process]: launching PwCalculation<532> iteration #1
2024-03-05 23:45:44 [279 | REPORT]:       [529|PwBaseWorkChain|results]: work chain completed after 1 iterations
2024-03-05 23:45:44 [280 | REPORT]:       [529|PwBaseWorkChain|on_terminated]: remote folders will not be cleaned
2024-03-05 23:45:46 [281 | REPORT]:     [505|DielectricWorkChain|run_null_field_scfs]: launching PwBaseWorkChain<546> with null electric field
2024-03-05 23:45:46 [282 | REPORT]:       [546|PwBaseWorkChain|run_process]: launching PwCalculation<549> iteration #1
2024-03-06 00:43:45 [283 | REPORT]:       [546|PwBaseWorkChain|results]: work chain completed after 1 iterations
2024-03-06 00:43:46 [284 | REPORT]:       [546|PwBaseWorkChain|on_terminated]: remote folders will not be cleaned
2024-03-06 00:43:46 [285 | REPORT]:     [505|DielectricWorkChain|run_electric_field_scfs]: launching PwBaseWorkChain<557> with electric field index 0 and sign 1.0 iteration #0
2024-03-06 00:43:47 [286 | REPORT]:       [557|PwBaseWorkChain|run_process]: launching PwCalculation<560> iteration #1
2024-03-06 00:43:48 [287 | REPORT]:     [505|DielectricWorkChain|run_electric_field_scfs]: launching PwBaseWorkChain<563> with electric field index 1 and sign 1.0 iteration #0
2024-03-06 00:43:49 [288 | REPORT]:       [563|PwBaseWorkChain|run_process]: launching PwCalculation<566> iteration #1
2024-03-06 00:43:49 [289 | REPORT]:     [505|DielectricWorkChain|run_electric_field_scfs]: launching PwBaseWorkChain<569> with electric field index 2 and sign 1.0 iteration #0
2024-03-06 00:43:50 [290 | REPORT]:       [569|PwBaseWorkChain|run_process]: launching PwCalculation<572> iteration #1
2024-03-06 00:43:51 [291 | REPORT]:     [505|DielectricWorkChain|run_electric_field_scfs]: launching PwBaseWorkChain<575> with electric field index 3 and sign 1.0 iteration #0
2024-03-06 00:43:52 [292 | REPORT]:       [575|PwBaseWorkChain|run_process]: launching PwCalculation<578> iteration #1
2024-03-06 00:43:52 [293 | REPORT]:     [505|DielectricWorkChain|run_electric_field_scfs]: launching PwBaseWorkChain<581> with electric field index 4 and sign 1.0 iteration #0
2024-03-06 00:43:53 [294 | REPORT]:       [581|PwBaseWorkChain|run_process]: launching PwCalculation<584> iteration #1
2024-03-06 00:43:54 [295 | REPORT]:     [505|DielectricWorkChain|run_electric_field_scfs]: launching PwBaseWorkChain<587> with electric field index 5 and sign 1.0 iteration #0
2024-03-06 00:43:54 [296 | REPORT]:       [587|PwBaseWorkChain|run_process]: launching PwCalculation<590> iteration #1
2024-03-06 01:38:15 [297 | REPORT]:       [557|PwBaseWorkChain|results]: work chain completed after 1 iterations
2024-03-06 01:38:15 [298 | REPORT]:       [557|PwBaseWorkChain|on_terminated]: remote folders will not be cleaned
2024-03-06 01:56:08 [299 | REPORT]:       [514|PwBaseWorkChain|results]: work chain completed after 1 iterations
2024-03-06 01:56:09 [300 | REPORT]:       [514|PwBaseWorkChain|on_terminated]: remote folders will not be cleaned
2024-03-06 01:56:10 [301 | REPORT]:     [501|PhononWorkChain|run_forces]: submitting `PwBaseWorkChain` <PK=619> with supercell n.o 01
2024-03-06 01:56:12 [302 | REPORT]:     [501|PhononWorkChain|run_forces]: submitting `PwBaseWorkChain` <PK=621> with supercell n.o 02
2024-03-06 01:56:13 [303 | REPORT]:       [621|PwBaseWorkChain|run_process]: launching PwCalculation<624> iteration #1
2024-03-06 01:56:14 [304 | REPORT]:     [501|PhononWorkChain|run_forces]: submitting `PwBaseWorkChain` <PK=626> with supercell n.o 03
2024-03-06 01:56:15 [305 | REPORT]:       [626|PwBaseWorkChain|run_process]: launching PwCalculation<629> iteration #1
2024-03-06 01:56:15 [306 | REPORT]:     [501|PhononWorkChain|run_forces]: submitting `PwBaseWorkChain` <PK=631> with supercell n.o 04
2024-03-06 01:56:16 [307 | REPORT]:       [631|PwBaseWorkChain|run_process]: launching PwCalculation<634> iteration #1
2024-03-06 01:56:16 [308 | REPORT]:     [501|PhononWorkChain|run_forces]: submitting `PwBaseWorkChain` <PK=636> with supercell n.o 05
2024-03-06 01:56:17 [309 | REPORT]:       [636|PwBaseWorkChain|run_process]: launching PwCalculation<639> iteration #1
2024-03-06 01:56:18 [310 | REPORT]:     [501|PhononWorkChain|run_forces]: submitting `PwBaseWorkChain` <PK=641> with supercell n.o 06
2024-03-06 01:56:19 [311 | REPORT]:       [641|PwBaseWorkChain|run_process]: launching PwCalculation<644> iteration #1
2024-03-06 01:56:19 [312 | REPORT]:     [501|PhononWorkChain|run_forces]: submitting `PwBaseWorkChain` <PK=646> with supercell n.o 07
2024-03-06 01:56:20 [313 | REPORT]:       [646|PwBaseWorkChain|run_process]: launching PwCalculation<649> iteration #1
2024-03-06 01:56:21 [314 | REPORT]:     [501|PhononWorkChain|run_forces]: submitting `PwBaseWorkChain` <PK=651> with supercell n.o 08
2024-03-06 01:56:22 [315 | REPORT]:       [651|PwBaseWorkChain|run_process]: launching PwCalculation<654> iteration #1
2024-03-06 01:56:22 [316 | REPORT]:     [501|PhononWorkChain|run_forces]: submitting `PwBaseWorkChain` <PK=656> with supercell n.o 09
2024-03-06 01:56:23 [317 | REPORT]:       [656|PwBaseWorkChain|run_process]: launching PwCalculation<659> iteration #1
2024-03-06 01:56:24 [318 | REPORT]:     [501|PhononWorkChain|run_forces]: submitting `PwBaseWorkChain` <PK=661> with supercell n.o 10
2024-03-06 01:56:25 [319 | REPORT]:       [661|PwBaseWorkChain|run_process]: launching PwCalculation<664> iteration #1
2024-03-06 01:56:25 [320 | REPORT]:     [501|PhononWorkChain|run_forces]: submitting `PwBaseWorkChain` <PK=666> with supercell n.o 11
2024-03-06 01:56:26 [321 | REPORT]:       [666|PwBaseWorkChain|run_process]: launching PwCalculation<669> iteration #1
2024-03-06 01:56:27 [322 | REPORT]:     [501|PhononWorkChain|run_forces]: submitting `PwBaseWorkChain` <PK=671> with supercell n.o 12
2024-03-06 01:56:28 [323 | REPORT]:       [671|PwBaseWorkChain|run_process]: launching PwCalculation<674> iteration #1
2024-03-06 01:56:29 [324 | REPORT]:       [619|PwBaseWorkChain|run_process]: launching PwCalculation<677> iteration #1
2024-03-06 02:04:25 [325 | REPORT]:       [563|PwBaseWorkChain|results]: work chain completed after 1 iterations
2024-03-06 02:04:26 [326 | REPORT]:       [563|PwBaseWorkChain|on_terminated]: remote folders will not be cleaned
2024-03-06 02:10:45 [327 | REPORT]:       [569|PwBaseWorkChain|results]: work chain completed after 1 iterations
2024-03-06 02:10:45 [328 | REPORT]:       [569|PwBaseWorkChain|on_terminated]: remote folders will not be cleaned
2024-03-06 02:10:46 [329 | REPORT]:       [575|PwBaseWorkChain|results]: work chain completed after 1 iterations
2024-03-06 02:10:46 [330 | REPORT]:       [575|PwBaseWorkChain|on_terminated]: remote folders will not be cleaned
2024-03-06 02:14:57 [331 | REPORT]:       [581|PwBaseWorkChain|results]: work chain completed after 1 iterations
2024-03-06 02:14:57 [332 | REPORT]:       [581|PwBaseWorkChain|on_terminated]: remote folders will not be cleaned
2024-03-06 02:32:54 [333 | REPORT]:       [587|PwBaseWorkChain|results]: work chain completed after 1 iterations
2024-03-06 02:32:54 [334 | REPORT]:       [587|PwBaseWorkChain|on_terminated]: remote folders will not be cleaned
2024-03-06 02:33:00 [335 | REPORT]:     [505|DielectricWorkChain|run_numerical_derivatives]: launching NumericalDerivativesWorkChain<719> for computing numerical derivatives.
2024-03-06 02:33:16 [336 | REPORT]:     [505|DielectricWorkChain|on_terminated]: remote folders will not be cleaned
2024-03-06 05:57:24 [337 |  ERROR]:       Traceback (most recent call last):
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/plumpy/events.py", line 97, in run
    await self._callback(*self._args, **self._kwargs)
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/plumpy/processes.py", line 567, in _run_task
    result = await coro(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/plumpy/utils.py", line 245, in wrap
    return coro_or_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/aiida/engine/processes/workchains/workchain.py", line 409, in _on_awaitable_finished
    self._resolve_awaitable(awaitable, value)
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/aiida/engine/processes/workchains/workchain.py", line 270, in _resolve_awaitable
    self._update_process_status()
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/aiida/engine/processes/workchains/workchain.py", line 290, in _update_process_status
    self.node.set_process_status(None)
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/aiida/orm/nodes/process/process.py", line 307, in set_process_status
    self.base.attributes.delete(self.PROCESS_STATUS_KEY)
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/aiida/orm/nodes/attributes.py", line 151, in delete
    self._node._check_mutability_attributes([key])
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/aiida/orm/utils/mixins.py", line 206, in _check_mutability_attributes
    if self.is_sealed:
       ^^^^^^^^^^^^^^
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/aiida/orm/utils/mixins.py", line 189, in is_sealed
    return self.base.attributes.get(self.SEALED_KEY, False)  # type: ignore[attr-defined]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/aiida/orm/nodes/attributes.py", line 76, in get
    attribute = self._backend_node.get_attribute(key)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/aiida/storage/psql_dos/orm/nodes.py", line 239, in get_attribute
    return self.model.attributes[key]
           ^^^^^^^^^^^^^^^^^^^^^
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/aiida/storage/psql_dos/orm/utils.py", line 80, in __getattr__
    if self.is_saved() and self._is_mutable_model_field(item) and not self._in_transaction():
       ^^^^^^^^^^^^^^^
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/aiida/storage/psql_dos/orm/utils.py", line 106, in is_saved
    return self._model.id is not None
           ^^^^^^^^^^^^^^
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/sqlalchemy/orm/attributes.py", line 566, in __get__
    return self.impl.get(state, dict_)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/sqlalchemy/orm/attributes.py", line 1086, in get
    value = self._fire_loader_callables(state, key, passive)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/sqlalchemy/orm/attributes.py", line 1116, in _fire_loader_callables
    return state._load_expired(state, passive)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/sqlalchemy/orm/state.py", line 798, in _load_expired
    self.manager.expired_attribute_loader(self, toload, passive)
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/sqlalchemy/orm/loading.py", line 1584, in load_scalar_attributes
    raise orm_exc.DetachedInstanceError(
sqlalchemy.orm.exc.DetachedInstanceError: Instance <DbNode at 0x14cb85a7e550> is not bound to a Session; attribute refresh operation cannot proceed (Background on this error at: https://sqlalche.me/e/20/bhk3)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/aiida/engine/processes/workchains/workchain.py", line 361, in on_exiting
    self._store_nodes(self.ctx)
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/aiida/engine/processes/workchains/workchain.py", line 346, in _store_nodes
    self._store_nodes(value)
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/aiida/engine/processes/workchains/workchain.py", line 346, in _store_nodes
    self._store_nodes(value)
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/aiida/engine/processes/workchains/workchain.py", line 342, in _store_nodes
    if isinstance(data, Node) and not data.is_stored:
                                      ^^^^^^^^^^^^^^
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/aiida/orm/entities.py", line 263, in is_stored
    return self._backend_entity.is_stored
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/aiida/storage/psql_dos/orm/entities.py", line 81, in is_stored
    return self.model.id is not None
           ^^^^^^^^^^^^^
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/aiida/storage/psql_dos/orm/utils.py", line 80, in __getattr__
    if self.is_saved() and self._is_mutable_model_field(item) and not self._in_transaction():
       ^^^^^^^^^^^^^^^
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/aiida/storage/psql_dos/orm/utils.py", line 106, in is_saved
    return self._model.id is not None
           ^^^^^^^^^^^^^^
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/sqlalchemy/orm/attributes.py", line 566, in __get__
    return self.impl.get(state, dict_)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/sqlalchemy/orm/attributes.py", line 1086, in get
    value = self._fire_loader_callables(state, key, passive)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/sqlalchemy/orm/attributes.py", line 1116, in _fire_loader_callables
    return state._load_expired(state, passive)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/sqlalchemy/orm/state.py", line 798, in _load_expired
    self.manager.expired_attribute_loader(self, toload, passive)
  File "/users/rkarkee/conda/envs/aiidaENV/lib/python3.11/site-packages/sqlalchemy/orm/loading.py", line 1584, in load_scalar_attributes
    raise orm_exc.DetachedInstanceError(
sqlalchemy.orm.exc.DetachedInstanceError: Instance <DbNode at 0x14cb85e0eed0> is not bound to a Session; attribute refresh operation cannot proceed (Background on this error at: https://sqlalche.me/e/20/bhk3)

In the status, it is waiting whereas in the report there is an error. There is also nothing on my slurm job list that is on queue or running. Since there is an error, I am not sure what that error is, will it stop eventually or do I need to kill the 493?

How should I proceed forward after this?

Thank you again.

Best
Rijan

Can you try restarting the daemon with verdi daemon restart --reset? Then wait a bit, few minutes, and see if the PwCalculation start completing.

Hi @sphuber

I did that but at that time I had issues in storage and rabbitmq in verdi status.

I shut down the services and restarted the services. Then verdi status was showing all green and working.

I resubmitted the job again. Following is the status and report.

(aiidaENV) rkarkee@ch-fe2:~> verdi process status 752
IRamanSpectraWorkChain<752> Waiting [None]
    └── HarmonicWorkChain<754> Waiting [1:if_(should_run_parallel)]
        ├── generate_preprocess_data<755> Finished [0]
        ├── PhononWorkChain<760> Waiting [2:run_base_supercell]
        │   ├── generate_preprocess_data<765> Finished [0]
        │   ├── get_supercell<770> Finished [0]
        │   └── PwBaseWorkChain<773> Waiting [2:while_(should_run_process)(1:run_process)]
        │       └── PwCalculation<779> Waiting
        └── DielectricWorkChain<764> Waiting [6:run_null_field_scfs]
            ├── PwBaseWorkChain<768> Finished [0] [3:results]
            │   └── PwCalculation<776> Finished [0]
            ├── PwBaseWorkChain<788> Finished [0] [3:results]
            │   └── PwCalculation<791> Finished [0]
            ├── compute_critical_electric_field<797> Finished [0]
            ├── get_accuracy_from_critical_field<799> Finished [0]
            ├── get_electric_field_step<801> Finished [0]
            └── PwBaseWorkChain<805> Waiting [2:while_(should_run_process)(1:run_process)]
                └── PwCalculation<808> Waiting
(aiidaENV) rkarkee@ch-fe2:~> verdi process report 752
2024-03-06 13:31:57 [353 | REPORT]: [752|IRamanSpectraWorkChain|run_spectra]: submitting `HarmonicWorkChain` <PK=754>
2024-03-06 13:31:59 [354 | REPORT]:   [754|HarmonicWorkChain|run_phonon]: submitting `PhononWorkChain` <PK=760>
2024-03-06 13:32:00 [355 | REPORT]:   [754|HarmonicWorkChain|run_dielectric]: submitting `DielectricWorkChain` <PK=764>
2024-03-06 13:32:01 [356 | REPORT]:     [764|DielectricWorkChain|run_base_scf]: launching base scf PwBaseWorkChain<768>
2024-03-06 13:32:03 [357 | REPORT]:     [760|PhononWorkChain|run_base_supercell]: launching base supercell scf PwBaseWorkChain<773>
2024-03-06 13:32:04 [358 | REPORT]:       [768|PwBaseWorkChain|run_process]: launching PwCalculation<776> iteration #1
2024-03-06 13:32:04 [359 | REPORT]:       [773|PwBaseWorkChain|run_process]: launching PwCalculation<779> iteration #1
2024-03-06 13:36:42 [360 | REPORT]:       [768|PwBaseWorkChain|results]: work chain completed after 1 iterations
2024-03-06 13:36:42 [361 | REPORT]:       [768|PwBaseWorkChain|on_terminated]: remote folders will not be cleaned
2024-03-06 13:36:43 [362 | REPORT]:     [764|DielectricWorkChain|run_nscf]: launching base scf PwBaseWorkChain<788>
2024-03-06 13:36:44 [363 | REPORT]:       [788|PwBaseWorkChain|run_process]: launching PwCalculation<791> iteration #1
2024-03-06 13:41:37 [365 | REPORT]:       [788|PwBaseWorkChain|results]: work chain completed after 1 iterations
2024-03-06 13:41:37 [366 | REPORT]:       [788|PwBaseWorkChain|on_terminated]: remote folders will not be cleaned
2024-03-06 13:41:39 [367 | REPORT]:     [764|DielectricWorkChain|run_null_field_scfs]: launching PwBaseWorkChain<805> with null electric field
2024-03-06 13:41:40 [368 | REPORT]:       [805|PwBaseWorkChain|run_process]: launching PwCalculation<808> iteration #1

Its been 3 hours since

2024-03-06 13:41:40 [368 | REPORT]:       [805|PwBaseWorkChain|run_process]: launching PwCalculation<808> iteration #1

but there is not any job queue or any progress update. I went to
verdi calcjob gotocomputer 808
I check that the calculation was done and completed successfully.

But it has been 3 hours there is no progress/ job queue and my status shows only waiting.

verdi status still good.

(aiidaENV) rkarkee@ch-fe2:~> verdi status
 ✔ version:     AiiDA v2.5.1
 ✔ config:      /users/rkarkee/.aiida
 ✔ profile:     quicksetup
 ✔ storage:     Storage for 'quicksetup' [open] @ postgresql://aiida_qs_rkarkee_20554bcc4bead70a3479c4ef8d5f1f4e:***@localhost:5434/quicksetup_rkarkee_20554bcc4bead70a3479c4ef8d5f1f4e / DiskObjectStoreRepository: 983ccf2a46b74550938fdb7753c1117f | /users/rkarkee/.aiida/repository/quicksetup/container
 ✔ rabbitmq:    Connected to RabbitMQ v3.8.14 as amqp://guest:guest@127.0.0.1:5672?heartbeat=600
 ✔ daemon:      Daemon is running with PID 97806
(aiidaENV) rkarkee@ch-fe2:~>

I feel like something has frozen. I also did

but still no progress. Is there way to check if something is running? Or stop this and restart from whereever it was?

I also checked process list as

(aiidaENV) rkarkee@ch-fe2:~> verdi process list
  PK  Created    Process label           ♻    Process State    Process status
----  ---------  ----------------------  ---  ---------------  ---------------------------------------
 752  2h ago     IRamanSpectraWorkChain       ⏵ Waiting        Waiting for child processes: 754
 754  2h ago     HarmonicWorkChain            ⏵ Waiting        Waiting for child processes: 760, 764
 760  2h ago     PhononWorkChain              ⏵ Waiting        Waiting for child processes: 773
 764  2h ago     DielectricWorkChain          ⏵ Waiting        Waiting for child processes: 805
 773  2h ago     PwBaseWorkChain              ⏵ Waiting        Waiting for child processes: 779
 779  2h ago     PwCalculation                ⏵ Waiting        Monitoring scheduler: job state RUNNING
 805  2h ago     PwBaseWorkChain              ⏵ Waiting        Waiting for child processes: 808
 808  2h ago     PwCalculation                ⏵ Waiting        Monitoring scheduler: job state RUNNING

Total results: 8

Report: ♻ Processes marked with check-mark were not run but taken from the cache.
Report: Add the option `-P pk cached_from` to the command to display cache source.
Report: Last time an entry changed state: 2h ago (at 13:42:55 on 2024-03-06)
Report: Checking daemon load... OK
Report: Using 4% of the available daemon worker slots.

Also, this may be useful to show.

(aiidaENV) rkarkee@ch-fe2:~> verdi process report 808
*** 808: CalcJobState.WITHSCHEDULER, scheduler state: JobState.RUNNING
*** Scheduler output: N/A
*** Scheduler errors: N/A
*** 0 LOG MESSAGES

When processes seem stuck, do the following:

verdi daemon stop
verdi process repair
verdi daemon start

Then wait a bit to give the daemon the time to pick things up again. If things still seem stuck, try to run verdi process play --all.

Thanks, this solved the issue.

Best
Rijan