Handling complex workchains

Federica_Zanca · January 24, 2024, 2:05pm

Hi, I’m running a workchain which at some points submits many quantum espresso calculations. Is there a command to keep track of how many calculations finished, how many had errors etc? If I do verdi process status or show or similar things they are just too many, I don’t really have an idea.
Also if let’s say one of these processes fails for any reason, how can I rerun it on the same node while it being still connected to the original workchain?

sphuber · January 24, 2024, 4:13pm

There isn’t really any better CLI command than verdi process status for this purpose. But you can use the Python API to write something yourself. For example:

for node in workchain_node.called:
     if node.is_failed:
         print(f'Node<{node.pk}> failed with exit status {node.exit_code}')

You can find and print any other information that you might find useful in this way.

If it is a “temporary” error that could simply go away if you were to run it again with exactly the same inputs, you could simply enable caching and relaunch the original workchain with the exact same inputs. Enable caching as follows:

verdi config set caching.default_enabled true
verdi daemon restart --reset

restarting the daemon is also important. With caching enabled, the workchain is rerun as usual, except calculations that already completed correctly will not be rerun, but the results will simply be taken from teh database. So only the failed calculations will be rerun.

If the calculation failed due to incorrect inputs, then this won’t help of course. But there is no mechanism to just change the inputs of a calculation inside a workchain and just rerun that.

Topic		Replies	Views
Re-submitting workchain with AiiDA-VASP New to AiiDA question	2	86	October 26, 2023
`WorkChain` continues before finishing the pervious step General Usage	1	43	October 4, 2024
Calculations get stuck in "created" state General Usage	6	142	September 17, 2024
Trying to understand AiiDA's restart from checkpoints capability General Usage question , aiida	2	91	May 16, 2024
CalcJob in `QUEUED` status even the actual job on HPC is finished General Usage	3	41	October 15, 2024

Handling complex workchains

Related topics