`WorkChain` continues before finishing the pervious step

Hi all!

I encountered the unexpected behavior that a WorkChain somehow continues with the next outline step before the previous one has finished. I’m using the CommonRelaxWorkChain of the aiida-common-workflows package (aiida-common-workflows/src/aiida_common_workflows/workflows/relax/workchain.py at master · aiidateam/aiida-common-workflows · GitHub).

This is the part where I observed the issue:

    def run_workchain(self):
        """Run the wrapped workchain."""
        inputs = self.exposed_inputs(self._process_class)
        return ToContext(workchain=self.submit(self._process_class, **inputs))

    def inspect_workchain(self):
        """Inspect the terminated workchain."""
        cls = self._process_class.__name__
        if not self.ctx.workchain.is_finished_ok:
            exit_status = self.ctx.workchain.exit_status
            self.report(f'{cls}<{self.ctx.workchain.pk}> failed with exit status {exit_status}.')
            return self.exit_codes.ERROR_SUB_PROCESS_FAILED.format(cls=cls, exit_status=exit_status)

        self.report(f'{cls}<{self.ctx.workchain.pk}> finished successfully.')

The WC fails stating that the subprocess failed (originating from the inspect_workchain step). Interestingly, the subprocess actually finishes successfully, i.e. exit_status 0. Therefore, the if statement should never be triggered, at least to my understanding.

verdi process status 13594
QuantumEspressoCommonRelaxWorkChain<13594> Finished [400] [1:inspect_workchain]
    └── PwRelaxWorkChain<13602> Finished [0] [3:results]
        └── PwBaseWorkChain<13606> Finished [0] [3:results]
            └── PwCalculation<13613> Finished [0]

I inserted some report statements after cls = self._process_class.__name__, and indeed, self.ctx.workchain.is_finished_ok is False as well as self.ctx.workchain.is_sealed. Moreover, inspecting the attributes of the subprocess (by inserting another report statement before the if block), those do still contain a checkpoint which I haven’t seen before for finished processes. I don’t post the attributes output here right now, as it is quite lengthy. In case it is required, just let me know.

I’m using AiiDA 2.6.2. Moreover, it seems that this doesn’t happen for a single worker, but only when I have multiple workers.

Thanks for the report. I chatted with @t-reents to reproduce this error. We made an issue Process is only stored after it fires `ProcessListener.on_process_finished` event resulting in unfinished process in next step · Issue #6579 · aiidateam/aiida-core · GitHub and a PR
Update process status in the database before broadcasting that process status has changed (fix #6579) by agoscinski · Pull Request #6580 · aiidateam/aiida-core · GitHub to solve this issue.

1 Like