Hi, I’m running a workchain which at some points submits many quantum espresso calculations. Is there a command to keep track of how many calculations finished, how many had errors etc? If I do verdi process status or show or similar things they are just too many, I don’t really have an idea.
Also if let’s say one of these processes fails for any reason, how can I rerun it on the same node while it being still connected to the original workchain?
There isn’t really any better CLI command than verdi process status
for this purpose. But you can use the Python API to write something yourself. For example:
for node in workchain_node.called:
if node.is_failed:
print(f'Node<{node.pk}> failed with exit status {node.exit_code}')
You can find and print any other information that you might find useful in this way.
If it is a “temporary” error that could simply go away if you were to run it again with exactly the same inputs, you could simply enable caching and relaunch the original workchain with the exact same inputs. Enable caching as follows:
verdi config set caching.default_enabled true
verdi daemon restart --reset
restarting the daemon is also important. With caching enabled, the workchain is rerun as usual, except calculations that already completed correctly will not be rerun, but the results will simply be taken from teh database. So only the failed calculations will be rerun.
If the calculation failed due to incorrect inputs, then this won’t help of course. But there is no mechanism to just change the inputs of a calculation inside a workchain and just rerun that.