Hi all,
We’re very close to merging the following PR, which will be a part of the upcoming v2.8.0 release:
which makes some changes in the behavior of the BaseRestartWorkChain when dealing with unhandled failures (see the commit message below for reference). This might affect your usage in two ways:
- By default the
BaseRestartWorkChainwill now immediately abort when encountering an unhandled failure. In the majority of cases, this is desirable: rerunning with the same inputs typically leads to another failure and loss of resources. In case you prefer the old behavior, you can explicitly seton_unhandled_failure='restart_once'. - The exit code
ERROR_SECOND_CONSECUTIVE_UNHANDLED_FAILUREhas been renamed to
ERROR_UNHANDLED_FAILURE. This might affect code that checks for this exit code to e.g. decide on the next step in a workflow. I’ve searched GitHub for this exit code, and could not find such usage, save for tests in packages we maintain.
We apologise in case these changes cause you any inconvenience. Happy computing! ![]()
Commit message
Currently, the BaseRestartWorkChain has hardcoded behavior for unhandled
failures: it restarts once, then aborts on the second consecutive failure
with the ERROR_SECOND_CONSECUTIVE_UNHANDLED_FAILURE exit code. This approach
lacks flexibility for different use cases where users might want immediate
abort or allow for human evaluation through pausing.
This commit introduces a new optional input on_unhandled_failure that allows
users to configure how the work chain handles unhandled failures. The available
options are:
abort(default): Abort immediately withERROR_UNHANDLED_FAILUREpause: Pause the work chain for user inspectionrestart_once: Restart once, then abort if it fails again (similar to old behavior)restart_and_pause: Restart once, then pause if it still fails
BREAKING: The default behavior is set to abort, which is the most conservative option.
In many cases this is the desired behavior, since doing a restart without changing the
inputs will typically fail again, wasting resources. Users who want the old “restart
once” behavior can explicitly set on_unhandled_failure='restart_once'.
BREAKING: The exit code ERROR_SECOND_CONSECUTIVE_UNHANDLED_FAILURE has been renamed to
ERROR_UNHANDLED_FAILURE to better reflect the new flexible behavior where
failure doesn’t necessarily mean “second consecutive” anymore.