`BaseRestartWorkChain`: upcoming changes in behavior for unhandled failures

mbercx · December 8, 2025, 10:51pm

Hi all,

We’re very close to merging the following PR, which will be a part of the upcoming v2.8.0 release:

github.com/aiidateam/aiida-core

💥 `BaseRestartWorkChain`: add `on_unhandled_failure` input

main ← mbercx:new/pause-unhandled

opened 09:08AM - 02 Dec 25 UTC

mbercx

+215 -74

Currently, the `BaseRestartWorkChain` has hardcoded behavior for unhandled failu…res: it restarts once, then aborts on the second consecutive failure with the `ERROR_SECOND_CONSECUTIVE_UNHANDLED_FAILURE` exit code. This approach lacks flexibility for different use cases where users might want immediate abort or allow for human evaluation through pausing. This commit introduces a new optional input `on_unhandled_failure` that allows users to configure how the work chain handles unhandled failures. The available options are: - `abort` (default): Abort immediately with ERROR_UNHANDLED_FAILURE - `pause`: Pause the work chain for user inspection - `restart_once`: Restart once, then abort if it fails again (similar to old behavior) - `restart_and_pause`: Restart once, then pause if it still fails BREAKING: The default behavior is set to `abort`, which is the most conservative option. In many cases this is the desired behavior, since doing a restart without changing the inputs will typically fail again, wasting resources. Users who want the old "restart once" behavior can explicitly set `on_unhandled_failure='restart_once'`. BREAKING: The exit code `ERROR_SECOND_CONSECUTIVE_UNHANDLED_FAILURE` has been renamed to `ERROR_UNHANDLED_FAILURE` to better reflect the new flexible behavior where failure doesn't necessarily mean "second consecutive" anymore.

which makes some changes in the behavior of the BaseRestartWorkChain when dealing with unhandled failures (see the commit message below for reference). This might affect your usage in two ways:

By default the BaseRestartWorkChain will now immediately abort when encountering an unhandled failure. In the majority of cases, this is desirable: rerunning with the same inputs typically leads to another failure and loss of resources. In case you prefer the old behavior, you can explicitly set on_unhandled_failure='restart_once'.
The exit code ERROR_SECOND_CONSECUTIVE_UNHANDLED_FAILURE has been renamed to
ERROR_UNHANDLED_FAILURE. This might affect code that checks for this exit code to e.g. decide on the next step in a workflow. I’ve searched GitHub for this exit code, and could not find such usage, save for tests in packages we maintain.

We apologise in case these changes cause you any inconvenience. Happy computing!

Commit message

Currently, the BaseRestartWorkChain has hardcoded behavior for unhandled
failures: it restarts once, then aborts on the second consecutive failure
with the ERROR_SECOND_CONSECUTIVE_UNHANDLED_FAILURE exit code. This approach
lacks flexibility for different use cases where users might want immediate
abort or allow for human evaluation through pausing.

This commit introduces a new optional input on_unhandled_failure that allows
users to configure how the work chain handles unhandled failures. The available
options are:

abort (default): Abort immediately with ERROR_UNHANDLED_FAILURE
pause: Pause the work chain for user inspection
restart_once: Restart once, then abort if it fails again (similar to old behavior)
restart_and_pause: Restart once, then pause if it still fails

BREAKING: The default behavior is set to abort, which is the most conservative option.
In many cases this is the desired behavior, since doing a restart without changing the
inputs will typically fail again, wasting resources. Users who want the old “restart
once” behavior can explicitly set on_unhandled_failure='restart_once'.

BREAKING: The exit code ERROR_SECOND_CONSECUTIVE_UNHANDLED_FAILURE has been renamed to
ERROR_UNHANDLED_FAILURE to better reflect the new flexible behavior where
failure doesn’t necessarily mean “second consecutive” anymore.

Topic		Replies	Views
Handling PwCalculation Failures with Exit Code 305 or 312 General Usage	1	60	October 4, 2025
Trying to understand AiiDA's restart from checkpoints capability General Usage question , aiida	2	119	May 16, 2024
`WorkChain` continues before finishing the pervious step General Usage	1	60	October 4, 2024
Processing calcjob outputs in BaseRestartWorkChain General Usage aiida	7	146	May 30, 2024
Handling complex workchains New to AiiDA question	1	79	January 24, 2024

`BaseRestartWorkChain`: upcoming changes in behavior for unhandled failures

Commit message

Related topics