How to kill a monster workflow 👹

So, uh. Somebody (I hope inadvertently) decided to push the limit of our AiiDAlab deployment and submitted a WorkChain that is kind of big. :sweat:
Specifically, verdi process list now counts ~18000 processes (and counting!) in various stages of “running”. I tried to increase the number of daemon workers and tweak various poll intervals but the overall system is hopelessly stuck, and verdi process kill either hangs or times out saying that process is unreachable.

While I have some ideas how to prevent this situation in the future, the question now is, how do I stop this monster? :japanese_goblin: Any advice would be greatly appreciated. (thoughts and prayers are welcome as well in these trying times :pray: )

1 Like

Stop the daemon. Then you can simply delete the nodes. Once the daemon starts again, any worker that receives the task associated with a deleted node will note this and just discard the task

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.

Thanks @sphuber! :bowing_man: I am not sure what I expected, perhaps something more magical, but this indeed did the trick. :slight_smile:

It is also really nice that I could just run verdi node delete on the root-level WorkChain and the whole workflow tree was deleted due to provenance rules.