Problem: the CalcJob
processes are in the QUEUED
status, but the actual jobs on HPC are finished
System: Ubuntu 20.04
AiiDA version: 2.6.2
Broker: RabbitMQ v3.8.2
Here are my processes:
$ verdi process list
PK Created Process label ♻ Process State Process status
----- --------- ----------------- --- --------------- --------------------------------------
15840 3D ago Cp2kBaseWorkChain ⏵ Waiting Waiting for child processes: 15846
15845 3D ago Cp2kBaseWorkChain ⏵ Waiting Waiting for child processes: 15852
15846 3D ago Cp2kCalculation ⏵ Waiting Monitoring scheduler: job state QUEUED
15851 3D ago Cp2kBaseWorkChain ⏵ Waiting Waiting for child processes: 15858
15852 3D ago Cp2kCalculation ⏵ Waiting Monitoring scheduler: job state QUEUED
...
I checked the remote job using:
verdi calcjob gotocomputer 15846
And the job is finished. Actually, all the jobs on HPC are finished, but the CalcJob
processes are in the QUEUED
status.
I tried to pause all processes, and found the CalcJob
processes are unreachable:
$ verdi process pause --all
Error: Process<15914> is unreachable.
Error: Process<15902> is unreachable.
Error: Process<15982> is unreachable.
Error: Process<15858> is unreachable.
Error: Process<15846> is unreachable.
...
Report: request to pause Process<15864> sent
Report: request to pause Process<15907> sent
Report: request to pause Process<16044> sent
Report: request to pause Process<15852> sent
Report: request to pause Process<15919> sent
Report: request to pause Process<15957> sent
...
I tried repair them using:
verdi daemon stop
verdi process repair
verdi daemon start
# then play all
verdi process play --all
But the process is still in the QUEUED
status.
Any suggestion on how to trigger the process to RUNNING
and then FINISHED
?